copilot-ci-doctor 🩺

From Red CI to Green PR — with Evidence, Confidence, and Safe Fixes

🎬 40-Second Demo (End-to-End)

https://github.com/manojmallick/copilot-ci-doctor/blob/main/demo_1.5x.gif

One command → failing CI → Copilot reasoning → safe fixes → green CI → PR

copilot-ci-doctor is a CLI tool that diagnoses GitHub Actions CI failures using GitHub Copilot CLI as its core reasoning engine.

Instead of manually digging through logs, Copilot CLI:

analyzes CI evidence,
explains why a pipeline failed in plain English,
proposes minimal, safe patch diffs with confidence scores,
iteratively fixes CI failures until the pipeline is green,
and opens a Pull Request against main with the fix — link included.

This is not log summarization.
It's evidence-based reasoning for CI failures.

✨ Why this exists

CI failures are one of the biggest productivity drains in software development:

logs are noisy,
root causes are unclear,
fixes often rely on trial and error.

copilot-ci-doctor turns CI failures into a guided flow:

failure → evidence → reasoning → safe fix → green CI → PR

📦 Install

# Run directly (no install needed)
npx copilot-ci-doctor demo

# Or install globally
npm install -g copilot-ci-doctor
copilot-ci-doctor analyze

🚀 What it does

Given a failed GitHub Actions run, the tool:

📦 Collects an Evidence Bundle with 5 tagged items:
- E1 repo info, E2 failed run metadata, E3 failed jobs & steps, E4 log excerpt (last 200 lines, redacted), E5 workflow YAML
🧠 Uses GitHub Copilot CLI to generate:
- ranked root-cause hypotheses (with confidence scores)
- plain-English explanations
- minimal, git-apply-compatible patch diffs
🛡️ Applies fixes safely:
- shows a diff preview
- asks for confirmation
- applies on a new ci-fix/* branch
- never touches secrets or main
🔁 Watch mode — iterates automatically:
- analyze → explain → fix → push → wait for CI result
- loops until CI passes or confidence drops below 80%
- single Copilot call per iteration (~0.33x token usage vs naive approach)
- prints a final scoreboard with token usage estimates
- opens a PR against main when CI is green

Without GitHub Copilot CLI, this tool does not work — all reasoning and patch generation comes directly from Copilot.

🎬 Quick Demo (Recommended)

The fastest way to see the full flow is the judge-mode demo:

npx copilot-ci-doctor demo

This will:

Create a demo repo with an intentionally broken GitHub Actions workflow
Push and trigger a failing CI run ❌
Watch loop kicks in — automatically iterates:
- Analyze the failure
- Explain the root cause in plain English
- Propose a safe fix with confidence score
- Apply, commit, and push the fix
- Wait for CI to re-run
Repeat until CI is green ✅
Print a final scoreboard

Example output:

━━━ Iteration 1 ━━━
  Analyze → npm ci requires package-lock.json [95%]
  Fix → Replace 'npm ci' with 'npm install' [95%] → pushed → CI still failing

━━━ Iteration 2 ━━━
  Analyze → Missing test.js file [95%]
  Fix → Create test.js [95%] → pushed → CI passing!

─── Scoreboard ───
  Iterations: 2
  Total time: 126.3s

  Token Usage (estimated):
    Input:  ~2,358 tokens
    Output: ~1,084 tokens
    Total:  ~3,442 tokens
    Savings: ~51% vs 3-call mode

  CI before: ✖ FAILED → after fix: ✓ PASSING
🎉 CI is fixed!

  🔗 Pull Request: https://github.com/your-org/your-repo/pull/1

⏱ For Judges (60 seconds)

Prerequisites: gh auth login + GitHub Copilot subscription enabled.

What demo does: creates a throwaway demo repo → pushes broken CI → enters fix loop → opens a PR when green. It does not touch any of your existing repos.

Where files go: all cache, patches, and debug output live under .copilot-ci-doctor/cache/ inside the demo repo — this directory is .gitignored and never committed.

🧪 Commands

`analyze` — Diagnose the failure

copilot-ci-doctor analyze

Collects evidence from the latest failed GitHub Actions run and generates ranked root-cause hypotheses with confidence scores.

`explain` — Plain-English explanation

copilot-ci-doctor explain

Explains the CI failure in plain English, including why it might pass locally and what likely changed. Reuses the cached evidence bundle from analyze.

`fix` — Generate and apply a patch

# Interactive: shows diff, asks for confirmation
copilot-ci-doctor fix

# Auto-confirm (for scripting)
copilot-ci-doctor fix --yes

# Full auto-fix mode: iterates until CI is green
copilot-ci-doctor fix --auto

Generates a minimal patch diff, previews it, and applies it on a new ci-fix/* branch.

--auto mode delegates directly to the watch command — it runs the same iterative loop: analyze → explain → fix → push → wait for CI → repeat until passing or confidence drops below 80%.

`watch` — Monitor and auto-fix

copilot-ci-doctor watch

Continuously monitors the CI pipeline using single-call mode (1 Copilot call per iteration instead of 3). When a failure is detected:

Collects evidence bundle from the failed run
Makes one combined Copilot call → returns hypotheses + explanation + patch
Normalizes patch hunk headers and applies with git apply
Commits and pushes on a ci-fix/<timestamp> branch
Polls for new CI run (10s interval, 180s timeout)
If CI still fails, loops back to step 1

Stops when:

CI passes ✅ → automatically creates a PR against main with fix details (description, confidence, root cause, explanation, iterations, token usage, and elapsed time)
Fix confidence drops below 80%
Max 5 iterations reached

Token usage is estimated at ~4 chars/token and displayed after each Copilot call.

`retry` — Re-run failed workflow

copilot-ci-doctor retry

Re-runs the most recent failed GitHub Actions workflow run.

`demo` — End-to-end demonstration

copilot-ci-doctor demo

Creates a broken demo repo, pushes to trigger CI, then hands off to watch for fully automated diagnosis and repair.

🧠 How GitHub Copilot CLI is used (core requirement)

GitHub Copilot CLI is the reasoning engine of this project.

All reasoning is invoked via:

gh copilot -p "<rendered prompt>" -s --no-custom-instructions

-p sends the full prompt (evidence bundle + instructions) non-interactively
-s produces silent, scriptable output
--no-custom-instructions ensures deterministic behavior
Each call has a 180-second timeout

It is used to:

analyze CI evidence and propose ranked hypotheses
explain failures in plain English (including why CI fails but local passes)
generate minimal patch diffs with confidence and risk levels

Four prompt modes are supported: | Mode | Prompt file | Description | |---|---|---| | hypotheses | prompts/hypotheses.txt | Ranked root-cause hypotheses | | explain | prompts/explain.txt | Plain-English explanation | | patch | prompts/patch.txt | Minimal unified diff patch | | combined | prompts/combined.txt | All three in a single call (watch mode) |

To keep this reliable, every Copilot response is:

constrained by a strict JSON contract
validated against the CI_DOCTOR_RESPONSE_V1 schema (modes: hypotheses, explain, patch, combined)
required to reference evidence IDs (E1, E2, …)
parsed from raw output with JSON fence extraction and fallback
on parse failure, raw output is saved (redacted) to .copilot-ci-doctor/cache/copilot-raw-output-debug.txt

This ensures Copilot is doing reasoned analysis, not free-form guessing.

🧱 How it works (internals)

Evidence Bundle
Collects 5 evidence items via gh CLI:
- E1 — repo info (gh repo view): owner/name, default branch, URL
- E2 — failed run metadata: run ID, workflow, branch, event, conclusion
- E3 — failed jobs & steps: filtered from gh run view --json jobs
- E4 — log excerpt: last 200 lines from gh run view --log-failed (falls back to --log), redacted
- E5 — workflow YAML: matched by workflow name from .github/workflows/
Copilot CLI
Evidence is rendered into prompt templates ({{EVIDENCE_BUNDLE}} placeholder) and sent via gh copilot -p "<prompt>" -s --no-custom-instructions with a 180s timeout.
JSON Extraction
Raw Copilot output is parsed by extracting JSON from markdown fences (json ...) or finding the first {…} block. On failure, the redacted raw output is saved for debugging.
Contract Validation
Every Copilot response is validated for:
- version must equal CI_DOCTOR_RESPONSE_V1
- mode must match one of: hypotheses, explain, patch, combined
- mode-specific checks: non-empty hypotheses array, confidence 0–100, valid patch string, evidence refs
- combined mode validates all three sub-schemas
Patch Normalization
LLMs frequently emit incorrect @@ hunk line counts. Before applying, the tool recounts old/new line counts in each hunk and rewrites the @@ headers to produce a valid unified diff.
Safe Apply
Patches are checked with git apply --check, previewed (truncated at 400 lines), applied on a new ci-fix/<timestamp> branch, and committed with a CI Doctor: prefix message.
Watch Loop
After applying a fix, the tool pushes, waits for a new CI run (10s poll interval, 180s timeout), and re-analyzes if still failing. Stops when:
- CI passes → creates a PR against main
- Fix confidence drops below 80%
- Max 5 iterations reached

⚡ Token Savings Strategy

Approach	Calls/iter	Evidence sends	Est. tokens/iter
3-call mode (analyze + explain + fix)	3	3×	~5,400
Single-call mode (combined)	1	1×	~1,800
Savings			~67%

One prompt, one response — combined.txt asks Copilot to return hypotheses + explanation + patch in a single JSON reply. The evidence bundle (~1,200 tokens) is sent once instead of three times.
Response splitting — the combined response is cached as latest-hypotheses.json, latest-explain.json, latest-patch.json so standalone commands still work.
Measured: a real 2-iteration demo used ~3,442 total tokens (~51% savings vs 3-call mode). Per-iteration estimates are shown in the scoreboard.

🛡️ Safety guarantees

Secret redaction — 11 pattern types are stripped before display or Copilot input: GitHub PATs (ghp_), OAuth tokens (gho_), fine-grained PATs (github_pat_), app installation tokens (ghs_), refresh tokens (ghr_), Bearer tokens, generic token/password values, private keys (RSA/DSA/EC/OPENSSH), AWS access key IDs (AKIA), OpenAI API keys (sk-), and npm tokens (npm_)
Fixes always require confirmation (unless explicitly overridden with --yes or --auto)
Low-confidence (<60%) or HIGH-risk patches are never auto-applied
Watch/auto mode stops if fix confidence drops below 80%
All changes go on a new ci-fix/<YYYYMMDD-HHMMSS> branch — main is never modified directly
When CI passes, a PR is created against main for review before merging
git apply --check runs before any patch is applied
Patch normalization — hunk @@ headers are recounted before apply (LLMs often emit wrong line counts)
Diffs are truncated at 400 lines in terminal preview (full patch saved to cache)
.gitignore excludes node_modules/, .copilot-ci-doctor/, .ci-doctor/, *.patch, and copilot-raw-output-debug.txt
No force-pushes — watch and demo only push to the repo you explicitly run them in (demo creates its own throwaway repo) and always use regular git push

📁 Architecture

copilot-ci-doctor/
├── package.json
├── package-lock.json
├── .gitignore
├── LICENSE
├── README.md
├── .github/
│   └── workflows/
│       └── publish-npm.yml     ← npm publish on version tag push
├── src/
│   ├── cli.js                  ← CLI entrypoint (commander-based)
│   ├── copilot/
│   │   ├── index.js            ← askCopilot() — prompt → invoke → extract JSON → validate
│   │   ├── invoke.js           ← gh copilot -p … -s wrapper (180s timeout)
│   │   └── contract.js         ← CI_DOCTOR_RESPONSE_V1 schema validation
│   ├── evidence/
│   │   ├── bundle.js           ← evidence bundle builder (E1–E5)
│   │   ├── githubActions.js    ← fetch runs, jobs, logs via gh CLI
│   │   ├── githubRepo.js       ← repo metadata via gh repo view
│   │   └── redact.js           ← 11-pattern secret redaction
│   ├── prompts/
│   │   └── loadPrompt.js       ← template loader ({{EVIDENCE_BUNDLE}} injection)
│   ├── utils/
│   │   ├── paths.js            ← path, cache dir & file helpers
│   │   └── print.js            ← chalk formatting (diffs, evidence, confidence)
│   └── commands/
│       ├── analyze.js           ← collect evidence + hypotheses
│       ├── explain.js           ← plain-English explanation
│       ├── fix.js               ← generate + normalize + apply patch
│       ├── watch.js             ← iterative auto-fix loop + scoreboard + PR
│       ├── retry.js             ← re-run failed workflow (gh run rerun --failed)
│       └── demo.js              ← create broken repo → watch → auto-fix
└── prompts/
    ├── hypotheses.txt          ← standalone analyze prompt
    ├── explain.txt             ← standalone explain prompt
    ├── patch.txt               ← standalone fix prompt
    └── combined.txt            ← single-call prompt (watch mode)

🧰 Prerequisites

Node.js ≥ 18
GitHub CLI (gh) — https://cli.github.com bash gh auth login
GitHub Copilot subscription (required for gh copilot CLI extension)
GitHub Copilot CLI extension installed and working: bash gh extension install github/gh-copilot gh copilot --help # verify it works
Permissions — your gh account must be able to create repos and open PRs (demo creates a throwaway repo). If org restrictions prevent this, run watch inside an existing repo instead.

Dependencies

Package	Purpose
`commander` ^12.1.0	CLI argument parsing & sub-commands
`chalk` ^5.3.0	Terminal colors & formatting
`execa` ^9.5.2	Shell command execution (`gh`, `git`)

📤 Publishing (maintainers)

This package is published to npm automatically via GitHub Actions when a version tag is pushed.

# 1. Bump the version in package.json
npm version patch   # or minor / major

# 2. Push the commit and the tag
git push origin main --follow-tags

The workflow at .github/workflows/publish-npm.yml will:

verify the tag matches package.json version
install dependencies (npm ci)
run a sanity check (node src/cli.js --help)
pack and inspect the package contents
verify npm auth token is set
publish to npm with provenance (--provenance --access public)

⚠️ Never publish manually with npm publish — always use the tag-based workflow.

📄 License

MIT

Built With

javascript