Inspiration

Modern debugging is fragmented. When something breaks, developers jump between logs, terminals, stack traces, monitoring dashboards, and source files. Even with powerful tools, incident resolution is often manual, repetitive, and time consuming. So we asked ourselves:

What if Cline didn’t just write code, what if it could investigate problems like an engineer would?

We decided to extend Cline beyond code generation and turn it into a context aware incident responder, one that understands project structure, executes commands, inspects files and reasons across signals. Instead of reacting to prompts, we wanted Cline to actively assist in diagnosing and resolving issues.

What it does

We built a custom workflow and skill layer on top of Cline that enables it to:

  1. Listens for PagerDuty alerts
    When an incident is triggered, PagerDuty sends a webhook to the app. The app validates the signature, ensures the service is configured, and creates an incident record.

  2. Notifies Slack and tracks progress
    It posts an “Investigating…” message in the service’s Slack channel and updates that same message as each phase runs (fetching logs → diagnosing → generating fix → creating PR → done).

  3. Fetches error logs
    It pulls logs (and stack traces) from a configurable source: mock (demo), Datadog, or CloudWatch, in a time window around the alert.

  4. Diagnoses with Cline CLI
    It clones the service’s GitHub repo, builds a structured prompt (incident details + logs + stack traces), and runs Cline CLI in non interactive mode. Cline uses Read/Grep/Glob to analyze the repo and outputs: root cause, affected files, proposed code changes (diffs), risk level, rollback plan, and confidence.

  5. Applies the fix (when safe)
    If confidence is above a threshold and there are proposed changes, it creates a fix branch, runs a second Cline pass with Edit/Write tools to apply the changes, then commits and pushes.

  6. Opens a draft PR
    It opens a draft pull request on GitHub with the diagnosis, affected files, diffs, risk, and rollback plan. Humans must review before merge.

  7. Summarizes in Slack
    It updates the original Slack message with the full summary (root cause, files, proposed fix, confidence, risk) and buttons: View Draft PR, Approve Fix (marks PR ready for review), Reject Fix (closes PR), View in PagerDuty.

So: PagerDuty → webhook → queue → logs + Cline diagnosis → optional automated fix branch + draft PR → Slack summary and human approval/reject.

This transforms Cline into a debugging workflow engine rather than just a coding assistant.

How we built it

Cline Incident Responder acts as an automated incident investigation and fix pipeline triggered by PagerDuty events.

  1. Incident Trigger When PagerDuty sends a webhook (POST /webhooks/pagerduty):

    • The signature is verified.
    • Duplicate events are ignored (idempotency).
    • The service configuration is looked up.
    • An incident record is created in Postgres.
    • A job is enqueued in BullMQ (Redis).
  2. Job Processing Pipeline A background worker processes each incident through a structured lifecycle:

    • Status UpdateRECEIVED → FETCHING_LOGS
    • Slack Notification → Posts “Investigating…” and stores the message thread.
    • Log Collection → Fetches logs (Datadog / CloudWatch / mock), normalizes into a structured format.
    • Diagnosis Phase (Read-Only Cline Pass)
    • Clones or updates the repository.
    • Builds a structured markdown prompt including: Incident details, Logs & stack traces and Repo context
    • Invokes Cline CLI with restricted tools (Read, Grep, Glob).
    • Parses structured output into: Root cause, Affected files, Proposed changes, Risk assessment and Confidence score
    • Fix Phase (Conditional Second Pass)
      If confidence ≥ threshold:
      • Creates a new fix branch.
      • Invokes Cline with edit permissions (Edit, Write).
      • Applies changes and commits.
      • Pushes branch to GitHub.
    • Draft PR Creation
      • Creates a draft Pull Request via Octokit.
      • Updates the incident record with PR link/number.
    • Slack Update
      • Updates the original Slack message with: Diagnosis summary, PR link, Action buttons (Approve / Reject)
      • Marks incident as COMPLETED (or FAILED).
  3. Human in the Loop Controls

    • All PRs are draft-only.
    • Slack buttons allow:
    • Approve Fix → Mark PR ready for review.
    • Reject Fix → Close PR.
    • A confidence threshold gate prevents unsafe changes.
  4. System Components

    • Fastify API → Webhooks, REST routes, health checks, dashboard.
    • BullMQ (Redis) → Job queue with retries, backoff, rate limiting.
    • Postgres (Drizzle ORM) → Incidents and service configuration storage.
    • Slack (Bolt, Socket Mode) → Notifications and approvals.
    • GitHub (Octokit) → Branching and PR creation.
    • Pluggable Log Sources → Datadog, CloudWatch, or mock implementation.
    • React Dashboard → Monitor incidents, manage services, view PRs.

In short:
PagerDuty triggers → logs are analyzed → Cline diagnoses → optional auto-fix branch → draft PR created → Slack for human approval — all orchestrated through a controlled, safe, and structured pipeline.

Challenges we ran into

  • Orchestrating Cline from outside: Cline is built for interactive use. The app had to drive it non interactively, restrict tools per phase (no edits during diagnosis) and get structured output without an API for which we relied on strict prompt templates and regex based parsing, this proved to be a challenge.

  • Another aspect was Safety and control: We didn’t just want blind automation; we wanted responsible automation. To mitigate this, we introduced several safeguards:

    1. A confidence threshold, if Cline isn’t confident enough, it won’t create a PR.
    2. All fixes are opened as draft PRs, never ready-to-merge by default.
    3. A human approval step in Slack, engineers can explicitly approve or reject the fix.
    4. Every PR includes a risk assessment and rollback plan, so reviewers understand impact before merging.
    5. Looking ahead, we plan to add a dry run or sandbox mode to simulate changes before they ever touch a real branch.

The goal isn’t to replace engineers, it’s to give them a powerful assistant while keeping humans firmly in control.

Accomplishments that we're proud of

  • It is a fully automated loop. From PagerDuty trigger to draft PR and Slack summary with a single pipeline, with clear phase boundaries and status tracking.

  • Treating Cline CLI as a callable “diagnosis and fix” service with constrained tools and structured prompts, rather than a one off chat tool.

  • The Human in the loop, while AI automation is great its not the same as the security with a human's work. Draft PRs + “Approve Fix” / “Reject Fix” in Slack give teams a single place to approve or reject automated fixes without touching GitHub until they choose.

What we learned

  • Debugging is cross context reasoning, not just code understanding.
  • AI becomes significantly more powerful when embedded directly into workflows.
  • Structured prompting and workflow design matter more than raw model capability.
  • Giving Cline execution + inspection capabilities unlocks entirely new use cases.
  • Customization is where the real power of Cline lies and tailoring it to specific workflows creates transformative productivity gains.

What's next for Cline Incident Responder

  • Right now, we support a subset of log providers. We’d love to fully complete the CloudWatch integration and expand into other ecosystems (like Grafana, Loki and Elastic). The goal is simple: no matter where your logs live, Cline should be able to reason over them.

  • Today, we run Cline as a subprocess through its CLI. If Cline introduces an official API or a headless mode in the future, we’d migrate to that immediately. It would give us tighter control, better observability, and cleaner error handling which ends up making the whole system more robust.

  • Incident response is just the beginning. The same structured workflow could power performance optimization, security audits, dependency upgrades, and CI/CD troubleshooting. We see this as the foundation for a broader AI operations teammate, not just an incident fixer.

Built With

Share this project:

Updates