NEW 46 features on the public roadmap — see what's next

Stop firefighting pipelines.
Let an AI agent fix them.

DevopsAgent watches your Jenkins, GitHub Actions, ArgoCD, Kubernetes and cloud stack — diagnoses failures, decides what to do, executes safe fixes, and learns from every incident. Open source. Self-hosted. Provider-agnostic LLMs.

Get started Learn more

Self-hosted

MIT licensed

Multi-LLM (Ollama, OpenAI, Anthropic, Gemini)

Docker-native

~ devopsagent run --tail

# 2025-11-15 14:02:11  jenkins-job: deploy-prod  #482
[detect]   pipeline failed: build step exited 137 (OOMKilled)
[plan]     similar incident #475 resolved by raising heap → confidence 0.83
[decide]   threshold 0.75 met → AUTO_REMEDIATE
[execute]  patched Jenkinsfile: -Xmx 2g → 4g
[verify]   re-run #483 ... SUCCESS in 4m 12s
[learn]    stored embedding + outcome in memory (sqlite)

How it works

A four-step loop, runs on every failure.

DevopsAgent doesn't replace your CI/CD — it sits beside it, watches outcomes, and acts only when it's confident enough to.

Detect

Pipeline failures, OOMKills, deployment rollbacks, security findings and synthetic alerts all flow through structured failure contexts.

Decide

A safety layer scores each suggested fix against a confidence threshold and per-agent policy before anything mutates production.

Act

Specialised executors retry Jenkins jobs, restart pods, replay GitHub Actions, re-deploy Argo apps, run Ansible playbooks or open a ticket.

Learn

Every outcome is embedded and stored, so the next similar failure is matched semantically — faster, cheaper, more confident.

Notify

Slack, Teams, Email or PagerDuty get a clean, explainable timeline of what happened, why, and what changed.

Audit

Every decision, prompt, response and side-effect is logged — fully replayable and reviewable from the bundled web UI.

Safety first

Three outcomes, one threshold.

Every plan lands in one of three buckets — driven by per-agent confidence thresholds, allow/deny lists, and rate limits. No silent magic.

Auto-remediate

High confidence, safe action

Plan score ≥ threshold and matches an allowed action class. The agent executes, verifies, then notifies. Zero human latency.

confidence ≥ 0.75 · action ∈ allowlist

Suggest only

Medium confidence, human approves

Score is meaningful but not high enough — the agent posts a one-click suggestion to your inbox with full reasoning attached.

0.45 ≤ confidence < 0.75

Escalate

Low confidence or risky surface

Unknown failure class, sensitive scope (prod secrets, IAM, DB migrations) or rate-limit tripped — straight to on-call with context bundle.

confidence < 0.45 · or scope = sensitive

Features

Everything an autonomous SRE needs.

A modular agent framework with pluggable LLMs, structured memory, and executors for the tools your team already uses.

🧠

Multi-provider LLM

Ollama, OpenAI, Anthropic, Gemini and any OpenAI-compatible endpoint — one config switch, no vendor lock-in.

🔌

Plugin agents

Drop a Python file in agents/plugins/ and the registry auto-loads it. Build your own integration in minutes.

📚

Semantic memory

SQLite + embeddings recall similar past failures so the agent gets faster, cheaper, and more confident over time.

🛡️

Decision engine

Per-agent confidence thresholds, allow/deny lists and rate limits gate every action before anything mutates real infra.

⚙️

Real executors

Jenkins retry, GitHub Actions replay, Kubernetes pod restart, Docker restart, Ansible, Terraform, Git revert and more.

🖥️

Web UI included

PHP dashboard with run history, metrics, costs, ROI, webhook log, super-admin and pipeline timeline — out of the box.

📨

Notifications

Slack, Teams, Email and PagerDuty channels for suggestions, auto-fixes and escalations — with full reasoning attached.

📊

Metrics & ROI

Prometheus-ready exporter, anomaly detection and a built-in ROI calculator so you can prove value to your CFO.

🔍

Explainable audit

Every prompt, response, decision and side-effect is logged in a replayable timeline. SOC-friendly by design.

Architecture

Three layers. No magic.

Sources feed structured failure contexts into a pluggable agent framework that emits decisions through a safety layer to specialised executors.

┌──────────────────────────────────────────────────────────────────────┐
│   SOURCES                                                            │
│   Jenkins  ·  GitHub Actions  ·  ArgoCD  ·  Kubernetes  ·  Cloud     │
│   Webhooks ·  Log files       ·  XLS scan ·  Synthetic alerts        │
└────────────────────────────┬─────────────────────────────────────────┘
                             ▼
┌──────────────────────────────────────────────────────────────────────┐
│   AGENT FRAMEWORK                                                    │
│   ┌─────────────┐   ┌─────────────┐   ┌──────────────┐               │
│   │  Extractor  │ → │   Decision  │ → │   Executor   │               │
│   │  (LLM call) │   │  (thresh.)  │   │  (per tool)  │               │
│   └─────────────┘   └─────────────┘   └──────────────┘               │
│       ▲                    │                  │                      │
│       │                    ▼                  ▼                      │
│   ┌─────────────────────────────────────────────────┐                │
│   │  Memory · Audit · Metrics · Notifications       │                │
│   │  (SQLite + embeddings + Prometheus + Slack)     │                │
│   └─────────────────────────────────────────────────┘                │
└──────────────────────────────────────────────────────────────────────┘
                             │
                             ▼
                  Web UI  ·  CLI  ·  Grafana

Integrations

Plays nicely with the tools you already run.

Production-ready executors today, more shipping on the roadmap below.

Jenkins

GitHub Actions

ArgoCD

Kubernetes

Docker

Ansible

Terraform

Git

Slack

Microsoft Teams

PagerDuty

Jira

ServiceNow

Prometheus

Grafana

Ollama

OpenAI

Anthropic

Gemini

GitLab CI · soon

CircleCI · soon

AWS Bedrock · soon

Get started

One container. Two minutes.

Run the published image from GHCR, point it at your LLM provider, and tail your first remediation.

~ docker run

# 1. pull the published image
docker pull your-registry/your-image:latest

# 2. run with your preferred LLM provider
docker run -d --name devopsagent \
  -p 8080:8080 \
  -e OLLAMA_HOST=http://host.docker.internal:11434 \
  -v devopsagent-data:/app/store \
  your-registry/your-image:latest

# 3. open the web UI
open http://localhost:8080

# or run a one-shot CLI loop
docker exec -it devopsagent python run_agents.py --once

Roadmap

46 features. Public backlog.

Filter by area — every item is also tracked in docs/req1.md with implementation notes.

Multi-step reasoning loops

Plan → act → observe → re-plan with budgeted tool calls.

Confidence calibration

Per-agent reliability scores that feed back into the decision engine.

Adversarial validator

Second LLM critiques every plan before it leaves the safety layer.

Tool use graphs

Replayable, auditable trees of which tool fired, when, and why.

Cost-aware routing

Cheap models for triage, large models only when scope > threshold.

GitLab CI agent

Pipeline failures, MR comments, runner restarts.

CircleCI agent

Job retries, orb diagnostics, workflow rewrites.

Azure DevOps agent

Pipelines, boards, repos — one integration.

AWS Bedrock provider

Claude & Llama via Bedrock with IAM-scoped credentials.

GCP Vertex provider

Gemini Pro via Vertex with service-account auth.

OpenTelemetry source

Traces & spans as a first-class failure signal.

Datadog / NewRelic

Alert ingestion + auto-acknowledge after remediation.

Vector store backend

Optional pgvector / Qdrant for teams beyond SQLite.

Per-team memory scopes

Isolated recall so prod doesn't poison dev advice.

Outcome feedback loop

Up/down-vote past fixes; rerank embeddings by success rate.

Runbook auto-extraction

Mine successful remediations into reusable runbooks.

Modern React UI

Optional SPA frontend alongside the bundled PHP dashboard.

Inline plan diff

See exactly what file/config the agent wants to change, side-by-side.

Approval inbox

One-click approve/deny for "suggest only" decisions.

Live timeline view

Stream every step of an in-flight remediation in real time.

Dark mode

A proper one — system-aware, not just inverted colors.

Mobile-first ops view

Approve fixes from your phone, on-call.

Keyboard shortcuts

Linear-grade navigation across the dashboard.

Bulk operations

Approve / dismiss many suggestions in one action.

Saved filters

Per-user views over runs, suggestions, and audits.

SSO / OIDC

Okta, Azure AD, Google Workspace.

RBAC

Per-resource roles: viewer, approver, operator, admin.

Organisations

Multi-team tenant isolation with per-tenant config.

Audit export

CSV / JSON / SIEM-ready streaming exports.

Secrets vault

Hashicorp Vault & AWS Secrets Manager backends.

Per-team budgets

Hard caps on LLM spend, auto-downgrade when nearing limits.

Cost forecasting

Project monthly LLM cost from current usage trend.

Prompt cache

Deterministic caching across identical extractor calls.

Token usage drill-down

Per-agent, per-provider, per-feature breakdown.

Policy-as-code

OPA-style guardrails over actions, scopes & spend.

Weekly digest email

Auto-summary of incidents, fixes & trends.

SLA dashboards

MTTR, MTBF, automation rate per service.

Executive PDF

One-page monthly board-ready report.

Shadow mode

Run agents in parallel without executing, score against humans.

Replay harness

Re-run historical incidents against new model versions.

Eval suite

CI-style benchmarks for every agent change.

Explainability API

Programmatic access to every decision's reasoning chain.

Voice-to-incident

"Hey agent, what blew up?" — voice query the audit log.

Auto-PR fixes

Open a real GitHub PR with the proposed change & tests.

Marketplace

Share community-built agent plugins one click away.

Multi-agent swarm

Multiple specialised agents debate a fix before it ships.

Stop firefighting pipelines. Let an AI agent fix them.