NEW 46 features on the public roadmap — see what's next

Stop firefighting pipelines.
Let an AI agent fix them.

DevopsAgent watches your Jenkins, GitHub Actions, ArgoCD, Kubernetes and cloud stack — diagnoses failures, decides what to do, executes safe fixes, and learns from every incident. Open source. Self-hosted. Provider-agnostic LLMs.

Self-hosted
MIT licensed
Multi-LLM (Ollama, OpenAI, Anthropic, Gemini)
Docker-native
~ devopsagent run --tail
# 2025-11-15 14:02:11  jenkins-job: deploy-prod  #482
[detect]   pipeline failed: build step exited 137 (OOMKilled)
[plan]     similar incident #475 resolved by raising heap → confidence 0.83
[decide]   threshold 0.75 met → AUTO_REMEDIATE
[execute]  patched Jenkinsfile: -Xmx 2g → 4g
[verify]   re-run #483 ... SUCCESS in 4m 12s
[learn]    stored embedding + outcome in memory (sqlite)

A four-step loop, runs on every failure.

DevopsAgent doesn't replace your CI/CD — it sits beside it, watches outcomes, and acts only when it's confident enough to.

1

Detect

Pipeline failures, OOMKills, deployment rollbacks, security findings and synthetic alerts all flow through structured failure contexts.

2

Decide

A safety layer scores each suggested fix against a confidence threshold and per-agent policy before anything mutates production.

3

Act

Specialised executors retry Jenkins jobs, restart pods, replay GitHub Actions, re-deploy Argo apps, run Ansible playbooks or open a ticket.

4

Learn

Every outcome is embedded and stored, so the next similar failure is matched semantically — faster, cheaper, more confident.

5

Notify

Slack, Teams, Email or PagerDuty get a clean, explainable timeline of what happened, why, and what changed.

6

Audit

Every decision, prompt, response and side-effect is logged — fully replayable and reviewable from the bundled web UI.

Three outcomes, one threshold.

Every plan lands in one of three buckets — driven by per-agent confidence thresholds, allow/deny lists, and rate limits. No silent magic.

Auto-remediate

High confidence, safe action

Plan score ≥ threshold and matches an allowed action class. The agent executes, verifies, then notifies. Zero human latency.

confidence ≥ 0.75 · action ∈ allowlist
Suggest only

Medium confidence, human approves

Score is meaningful but not high enough — the agent posts a one-click suggestion to your inbox with full reasoning attached.

0.45 ≤ confidence < 0.75
Escalate

Low confidence or risky surface

Unknown failure class, sensitive scope (prod secrets, IAM, DB migrations) or rate-limit tripped — straight to on-call with context bundle.

confidence < 0.45 · or scope = sensitive

Everything an autonomous SRE needs.

A modular agent framework with pluggable LLMs, structured memory, and executors for the tools your team already uses.

🧠

Multi-provider LLM

Ollama, OpenAI, Anthropic, Gemini and any OpenAI-compatible endpoint — one config switch, no vendor lock-in.

🔌

Plugin agents

Drop a Python file in agents/plugins/ and the registry auto-loads it. Build your own integration in minutes.

📚

Semantic memory

SQLite + embeddings recall similar past failures so the agent gets faster, cheaper, and more confident over time.

🛡️

Decision engine

Per-agent confidence thresholds, allow/deny lists and rate limits gate every action before anything mutates real infra.

⚙️

Real executors

Jenkins retry, GitHub Actions replay, Kubernetes pod restart, Docker restart, Ansible, Terraform, Git revert and more.

🖥️

Web UI included

PHP dashboard with run history, metrics, costs, ROI, webhook log, super-admin and pipeline timeline — out of the box.

📨

Notifications

Slack, Teams, Email and PagerDuty channels for suggestions, auto-fixes and escalations — with full reasoning attached.

📊

Metrics & ROI

Prometheus-ready exporter, anomaly detection and a built-in ROI calculator so you can prove value to your CFO.

🔍

Explainable audit

Every prompt, response, decision and side-effect is logged in a replayable timeline. SOC-friendly by design.

Three layers. No magic.

Sources feed structured failure contexts into a pluggable agent framework that emits decisions through a safety layer to specialised executors.

┌──────────────────────────────────────────────────────────────────────┐
│   SOURCES                                                            │
│   Jenkins  ·  GitHub Actions  ·  ArgoCD  ·  Kubernetes  ·  Cloud     │
│   Webhooks ·  Log files       ·  XLS scan ·  Synthetic alerts        │
└────────────────────────────┬─────────────────────────────────────────┘
                             ▼
┌──────────────────────────────────────────────────────────────────────┐
│   AGENT FRAMEWORK                                                    │
│   ┌─────────────┐   ┌─────────────┐   ┌──────────────┐               │
│   │  Extractor  │ → │   Decision  │ → │   Executor   │               │
│   │  (LLM call) │   │  (thresh.)  │   │  (per tool)  │               │
│   └─────────────┘   └─────────────┘   └──────────────┘               │
│       ▲                    │                  │                      │
│       │                    ▼                  ▼                      │
│   ┌─────────────────────────────────────────────────┐                │
│   │  Memory · Audit · Metrics · Notifications       │                │
│   │  (SQLite + embeddings + Prometheus + Slack)     │                │
│   └─────────────────────────────────────────────────┘                │
└──────────────────────────────────────────────────────────────────────┘
                             │
                             ▼
                  Web UI  ·  CLI  ·  Grafana

Plays nicely with the tools you already run.

Production-ready executors today, more shipping on the roadmap below.

Jenkins
GitHub Actions
ArgoCD
Kubernetes
Docker
Ansible
Terraform
Git
Slack
Microsoft Teams
PagerDuty
Jira
ServiceNow
Prometheus
Grafana
Ollama
OpenAI
Anthropic
Gemini
GitLab CI · soon
CircleCI · soon
AWS Bedrock · soon

One container. Two minutes.

Run the published image from GHCR, point it at your LLM provider, and tail your first remediation.

~ docker run
# 1. pull the published image
docker pull your-registry/your-image:latest

# 2. run with your preferred LLM provider
docker run -d --name devopsagent \
  -p 8080:8080 \
  -e OLLAMA_HOST=http://host.docker.internal:11434 \
  -v devopsagent-data:/app/store \
  your-registry/your-image:latest

# 3. open the web UI
open http://localhost:8080

# or run a one-shot CLI loop
docker exec -it devopsagent python run_agents.py --once

46 features. Public backlog.

Filter by area — every item is also tracked in docs/req1.md with implementation notes.

01
Multi-step reasoning loops
Plan → act → observe → re-plan with budgeted tool calls.
02
Confidence calibration
Per-agent reliability scores that feed back into the decision engine.
03
Adversarial validator
Second LLM critiques every plan before it leaves the safety layer.
04
Tool use graphs
Replayable, auditable trees of which tool fired, when, and why.
05
Cost-aware routing
Cheap models for triage, large models only when scope > threshold.
06
GitLab CI agent
Pipeline failures, MR comments, runner restarts.
07
CircleCI agent
Job retries, orb diagnostics, workflow rewrites.
08
Azure DevOps agent
Pipelines, boards, repos — one integration.
09
AWS Bedrock provider
Claude & Llama via Bedrock with IAM-scoped credentials.
10
GCP Vertex provider
Gemini Pro via Vertex with service-account auth.
11
OpenTelemetry source
Traces & spans as a first-class failure signal.
12
Datadog / NewRelic
Alert ingestion + auto-acknowledge after remediation.
13
Vector store backend
Optional pgvector / Qdrant for teams beyond SQLite.
14
Per-team memory scopes
Isolated recall so prod doesn't poison dev advice.
15
Outcome feedback loop
Up/down-vote past fixes; rerank embeddings by success rate.
16
Runbook auto-extraction
Mine successful remediations into reusable runbooks.
17
Modern React UI
Optional SPA frontend alongside the bundled PHP dashboard.
18
Inline plan diff
See exactly what file/config the agent wants to change, side-by-side.
19
Approval inbox
One-click approve/deny for "suggest only" decisions.
20
Live timeline view
Stream every step of an in-flight remediation in real time.
21
Dark mode
A proper one — system-aware, not just inverted colors.
22
Mobile-first ops view
Approve fixes from your phone, on-call.
23
Keyboard shortcuts
Linear-grade navigation across the dashboard.
24
Bulk operations
Approve / dismiss many suggestions in one action.
25
Saved filters
Per-user views over runs, suggestions, and audits.
26
SSO / OIDC
Okta, Azure AD, Google Workspace.
27
RBAC
Per-resource roles: viewer, approver, operator, admin.
28
Organisations
Multi-team tenant isolation with per-tenant config.
29
Audit export
CSV / JSON / SIEM-ready streaming exports.
30
Secrets vault
Hashicorp Vault & AWS Secrets Manager backends.
31
Per-team budgets
Hard caps on LLM spend, auto-downgrade when nearing limits.
32
Cost forecasting
Project monthly LLM cost from current usage trend.
33
Prompt cache
Deterministic caching across identical extractor calls.
34
Token usage drill-down
Per-agent, per-provider, per-feature breakdown.
35
Policy-as-code
OPA-style guardrails over actions, scopes & spend.
36
Weekly digest email
Auto-summary of incidents, fixes & trends.
37
SLA dashboards
MTTR, MTBF, automation rate per service.
38
Executive PDF
One-page monthly board-ready report.
39
Shadow mode
Run agents in parallel without executing, score against humans.
40
Replay harness
Re-run historical incidents against new model versions.
41
Eval suite
CI-style benchmarks for every agent change.
42
Explainability API
Programmatic access to every decision's reasoning chain.
43
Voice-to-incident
"Hey agent, what blew up?" — voice query the audit log.
44
Auto-PR fixes
Open a real GitHub PR with the proposed change & tests.
45
Marketplace
Share community-built agent plugins one click away.
46
Multi-agent swarm
Multiple specialised agents debate a fix before it ships.

Your pipelines, finally on autopilot.

Open source, MIT licensed, runs in one container. Start with shadow mode and let the agent earn its trust.