Deploy Doctor

AI copilot that diagnoses broken CI/CD pipelines and ships one-click fix PRs in seconds.

Every engineering team loses hours per week to flaky CI/CD failures. Developers context-switch away from real work to dig through thousands of lines of log output, identify whether it's a flaky test, a dependency bump, a YAML typo, or an infra hiccup, then hand-craft a fix. The feedback loop between red build and green build is slow, repetitive, and deeply hated.

Deploy Doctor ingests GitHub Actions webhook events (or simulated ones in demo mode) through serverless endpoints and pipes failed runs into an AI diagnosis engine. The engine correlates the raw logs with the commit diff and a repo-level memory of past failures to classify the root cause, generates a concrete patch (unified diff) plus a human-readable explanation, and lets the developer one-click apply it as a pull request. A live dashboard, failure analytics, and a Slack-style notification feed close the loop.

Key features

  • Live Pipeline Feed — Server-Sent Events stream every run in real time, with failed runs pulsing red.
  • AI Root Cause Diagnosis — classifies failures into flaky_test, dependency, config_yaml, infra_timeout, lint_type, or build_error, with a step-by-step reasoning trace and highlighted log lines.
  • One-Click Fix PR — synthesizes a unified-diff patch, opens a simulated PR, and replays the pipeline with a progress animation that turns the build green.
  • Repo Memory (RAG-lite) — every failure is fingerprinted. Recurring flakes boost confidence and surface prior fixes in a "Seen this before?" panel.
  • Slack-style Notifications — a collapsible #deploy-doctor channel drawer with deep-linked buttons.
  • Demo Mode / Inject Failure — 7 realistic scenarios (flaky Jest, NPM ERESOLVE, invalid YAML, Docker OOM, ESLint, TypeScript, Python import) that drive the whole red-to-green pipeline end-to-end.
  • GitHub webhook endpoint — real POST /api/webhooks/github with HMAC-SHA256 signature verification.
  • Confidence-gated auto-merge — per-category policy + confidence threshold slider in Settings.
  • Analytics — 14-day failures-by-category area chart, flakiest-tests bar, MTTF on-vs-off trend line, and recent-fixes table.

Tech stack

Next.js 14 (App Router) · TypeScript · Tailwind CSS · shadcn-style Radix primitives · Framer Motion · Recharts · Lucide Icons · SQLite (better-sqlite3) · Zod · Server-Sent Events · Mock LLM engine (rule-based + templated reasoning, OpenAI-compatible interface stub) · Docker.

Seed data is generated on first boot, so the app looks fully populated from the moment you open it. No API keys are required — the whole demo works offline.

Run it locally

npm install
npm run dev
# open http://localhost:3000

Seeded repos, runs, diagnoses, fixes, and Slack messages are created automatically in a local SQLite database at .data/deploy-doctor.sqlite.

Run it in Docker

docker build -t app .
docker run -p 3000:3000 app
# open http://localhost:3000

The image uses Next.js's standalone output so it's small and boot is fast. Persist the SQLite file by mounting /app/.data:

docker run -p 3000:3000 -v dd-data:/app/.data app

Try the demo in 30 seconds

  1. Open /dashboard.
  2. Click Inject Failure (top right) and pick any scenario (e.g. Flaky Jest test or Docker build OOM).
  3. Watch the feed: a new run appears → flips to failed → gets a diagnosis → produces a fix PR.
  4. Click the run to open its detail page, switch to Fix PR, and hit Simulate Re-run — the red build turns green.
  5. Visit /analytics to see hours-saved and failures-by-category update live.

Architecture

app/                     # Next.js App Router pages
  page.tsx               # Landing
  (app)/                 # Authenticated shell (dashboard, analytics, …)
  api/                   # Serverless routes: /api/runs, /api/diagnose,
                         # /api/fixes, /api/webhooks/github, /api/events, …
components/              # shadcn-style UI + feature components
lib/
  db.ts                  # SQLite schema + helpers
  scenarios.ts           # 7 realistic failure scenarios (logs, patches,
                         # reasoning traces)
  engine.ts              # Diagnosis + fix engine, auto-merge policy
  events.ts              # In-memory SSE broadcast bus
  seed.ts                # Deterministic-enough seed data

API routes (summary)

Method Path Purpose
GET /api/repos Repos + health
GET /api/runs Recent runs
GET /api/runs/:id Run + logs + diagnosis + fix + similar
POST /api/runs/simulate Inject a scenario failure
POST /api/diagnose Run diagnosis (idempotent)
POST /api/fixes Generate fix patch
POST /api/fixes/:id/open-pr Open a simulated PR
POST /api/fixes/:id/rerun Replay pipeline, return green run
GET /api/events SSE stream
GET /api/analytics Aggregated metrics
GET /api/notifications Slack-style feed
POST /api/webhooks/github Real workflow_run webhook receiver
GET/PUT /api/settings Auto-merge / threshold / slack / reset

Credit

Built for the hackathon by Aryan Choudhary (aryancta@gmail.com).

Built With

  • better-sqlite3
  • docker
  • framer-motion
  • lucide-react
  • next.js-14
  • radix-ui
  • recharts
  • server-sent-events
  • tailwindcss
  • typescript
  • zod
Share this project:

Updates