Inspiration
The average developer spends 40% of their time reviewing PRs, chasing flaky tests, and triaging tech debt — not building. We watched teams drown in context-switching between GitHub, CI dashboards, coverage tools, and security scanners, and realized the entire workflow could be collapsed into a single AI-native command center where you manage outcomes, not code.
What it does
Agent HQ lets you connect any GitHub repo and instantly get deep, Claude-powered code analysis — every PR reviewed for bugs, security holes, and missing tests with plain-English explanations instead of raw linter output. When issues are found, a coordinated swarm of six specialized AI agents (Reviewer, FixGenerator, TestWriter, SecurityAuditor, RefactorAgent, DocWriter) work in parallel to generate fixes, write tests, and push a clean PR — one click from "problem found" to "PR created." The dashboard surfaces real-time health radar, PR risk scoring, coverage visualization, FinOps cost tracking, and a live swarm monitor showing every agent's progress.
How we built it
Python/FastAPI backend with WebSocket-driven live updates, Next.js 14 frontend with shadcn/ui and Recharts, Claude API as the core intelligence engine powering both deep code analysis and the multi-agent swarm, and GitHub API for repo ingestion and automated PR creation. We designed 15 Pydantic schemas as frozen interface contracts on day one, built a comprehensive mock data layer enabling four engineers to develop in parallel for 12+ hours without blocking each other, and used feature flags so every external dependency (Claude API, GitHub, Nemotron, MLflow) degrades gracefully to local fallbacks.
Challenges we ran into
Getting Claude to return reliably parseable JSON for structured code reviews required serious prompt engineering — we built a complete fallback chain (JSON extraction → regex parsing → template defaults) so the system never crashes on malformed AI output. The swarm coordination was the hardest design problem: figuring out dependency ordering (FixGenerator needs Reviewer's output first, but TestWriter and SecurityAuditor can run in parallel) and making it actually work with asyncio.gather while tracking tokens and costs per-agent in real-time across concurrent executions.
Accomplishments that we're proud of
We built a fully functional AI agent swarm where six specialized agents coordinate with dependency-aware parallel execution — the Coordinator reads all issues and plans the work, agents like FixGenerator and TestWriter run simultaneously where possible, and the entire pipeline from "connect a GitHub repo" to "PR created with fixes" works end-to-end in a single session with real-time progress visible in the dashboard. The architecture's graceful degradation is the other thing we're proud of: kill every external API key and the dashboard still loads, PRs still get heuristic risk scores, and translation falls back to regex templates — we designed for resilience from hour zero, not as an afterthought.
What we learned
Schema-first development is everything — freezing all 15 data contracts before writing a single line of implementation let four engineers work independently for days without a merge conflict on interfaces. We also learned that the real product differentiator isn't the AI analysis itself (anyone can call Claude) — it's the closed loop from "issue found" to "fix applied" in one click with cost tracking and progress visibility that humans actually trust enough to let agents ship code.
What's next for Agent HQ
Multi-repo swarm operations where agents understand cross-service dependencies and coordinate fixes across an entire microservices architecture simultaneously. A learning loop where successful fix patterns get embedded as reusable "skill recipes" — so the hundredth null-check fix costs near-zero tokens and executes instantly. And an open agent marketplace where teams publish and share specialized agents (compliance auditor, migration assistant, performance profiler) that plug directly into the swarm.
Built With
- claude
- databricks
- mlflow
- python
- typescript
Log in or sign up for Devpost to join the conversation.