Inspiration

Every engineer knows the 3 AM nightmare: your phone buzzes, production is down, users are screaming, and you're staring at a wall of logs trying to figure out what broke. The average incident costs companies $5,600 per minute, yet most teams still rely on manual, reactive processes to detect and fix issues. We asked ourselves: what if your infrastructure could heal itself? What if the moment a deployment fails, an AI agent already knows the root cause, has drafted the fix, and is waiting for you to click one button? That's what inspired AIA - Autonomous Incident Agent.

What it does

AIA is a fully autonomous incident response platform. The moment something breaks in production, AIA takes over:

  • Detects production failures in real-time via GitHub and Vercel deployment webhooks
  • Diagnoses root causes automatically using AI-powered log analysis (powered by You.com RAG API)
  • Auto-creates a GitHub PR - automatically branches, applies the fix patch, and opens a Pull Request with the fix, ready to merge
  • Generates fixes - manual step-by-step guides, AI prompts, and patch diffs - instantly
  • One-click fix workflows via Kilo VS Code extension deep links
  • Cline CLI pipeline - uses Cline as programmable infrastructure to apply fixes autonomously
  • Professional PDF reports generated via Foxit Document Generation + PDF Services APIs
  • Voice-controlled incident queries using Deepgram speech-to-text
  • Visual incident boards on Miro for team collaboration
  • Real-time dashboard with incident timeline, analytics, and live status updates
  • AI Chat at /chat - ask questions about incidents in natural language
  • Analytics timeline - incident trends, MTTR metrics, resolution rates

How we built it

AIA is built as a monorepo with 13 independent microservices, each with a single responsibility, all communicating over HTTP. The frontend is a Next.js dashboard with multiple specialized views: real-time incident feeds, an analytics page, an AI chat interface, a timeline view, and the incidents page where all the fix actions live. The backend is a fleet of lightweight Bun-powered services: one that listens for incoming webhooks and creates incidents, one that manages all incident state in a PostgreSQL database, one that runs AI-powered root cause analysis, one that handles GitHub operations (including Auto-PR creation), one that monitors runtime errors, one that sends notifications, and dedicated services for Deepgram voice processing and Miro board creation. Sponsor Integrations:

  • You.com - RAG API powers the AI Chat and root cause analysis
  • Kilo - VS Code deep links + /api/kilo/prompt for one-click fix workflows
  • Cline - Custom hooks (/api/cline/hook) + CLI pipeline (/api/cline/pipeline) for autonomous fix execution
  • Foxit - /api/foxit/report/[id] uses Document Generation API to create PDFs, then PDF Services API to watermark them
  • Deepgram - Live speech-to-text for voice-controlled incident queries
  • Miro - Automatic sticky-note incident boards for team war rooms

The entire system is deployed on Vercel (frontend), Render (backend services), and Neon (PostgreSQL), with Clerk handling authentication.

Challenges we ran into

  • Microservice orchestration - Getting 13 independent Bun services to communicate reliably across different deployment platforms required careful URL management and health checks
  • Miro sandbox limitations - The Miro sandbox environment doesn't allow creating new boards, so we pivoted to adding sticky notes to an existing board - turning a blocker into a feature
  • Real-time incident streaming - Building a live incident feed with Server-Sent Events across a serverless Next.js deployment required creative workarounds for connection persistence
  • Foxit DOCX template generation - Generating valid DOCX XML in-memory (without a file system) to feed into Foxit's base64 API required deep understanding of the Open XML spec
  • Multi-sponsor integration - Integrating 6 different sponsor APIs with different auth patterns, rate limits, and response formats while keeping the codebase clean and maintainable

Accomplishments that we're proud of

  • Auto-generated Pull Requests - AIA goes beyond "suggestions" and actually writes the code, opening a PR you can merge instantly
  • 6 sponsor APIs integrated and working end-to-end
  • Full autonomous pipeline: webhook → AI analysis → fix generation → one-click resolution
  • Voice-controlled incident queries - ask "what's the latest incident?" and get a spoken response
  • Zero manual intervention from detection to fix suggestion - fully automated
  • Production-deployed on Vercel + Render with real webhook integrations
  • Built the entire system from scratch during the hackathon as a working, deployable product

What we learned

  • AI-powered DevOps is the future - LLMs are surprisingly good at root cause analysis when given structured log data
  • Microservices need contracts - clear API boundaries between services saved us countless hours of debugging
  • Sponsor APIs are powerful building blocks - combining You.com (AI), Kilo (developer workflow), Cline (autonomous execution), Foxit (document generation), Deepgram (voice), and Miro (collaboration) created something far more powerful than any single tool
  • Bun is production-ready - its speed and built-in WebSocket support made our real-time services significantly simpler to build

What's next for Autonomous Incident Agent (AIA)

  • Resend email alerts - email the on-call engineer instantly with the Foxit PDF report attached
  • More webhook sources - AWS CloudWatch, Datadog, PagerDuty, Sentry integrations
  • Automated regression testing - Cline CLI runs your test suite after applying a fix to validate it before merging
  • Multi-repo support - monitor and fix incidents across an entire organization's GitHub repos
  • Open-source + SaaS - free self-hosted version + managed cloud offering for teams

Built With

Share this project:

Updates