Agent HQ

Inspiration

The average developer spends 40% of their time reviewing PRs, chasing flaky tests, and triaging tech debt — not building. We watched teams drown in context-switching between GitHub, CI dashboards, coverage tools, and security scanners, and realized the entire workflow could be collapsed into a single AI-native command center where you manage outcomes, not code.

What it does

Agent HQ lets you connect any GitHub repo and instantly get deep, Claude-powered code analysis — every PR reviewed for bugs, security holes, and missing tests with plain-English explanations instead of raw linter output. When issues are found, a coordinated swarm of six specialized AI agents (Reviewer, FixGenerator, TestWriter, SecurityAuditor, RefactorAgent, DocWriter) work in parallel to generate fixes, write tests, and push a clean PR — one click from "problem found" to "PR created." The dashboard surfaces real-time health radar, PR risk scoring, coverage visualization, FinOps cost tracking, and a live swarm monitor showing every agent's progress.

How we built it

Python/FastAPI backend with WebSocket-driven live updates, Next.js 14 frontend with shadcn/ui and Recharts, Claude API as the core intelligence engine powering both deep code analysis and the multi-agent swarm, and GitHub API for repo ingestion and automated PR creation. We designed 15 Pydantic schemas as frozen interface contracts on day one, built a comprehensive mock data layer enabling four engineers to develop in parallel for 12+ hours without blocking each other, and used feature flags so every external dependency (Claude API, GitHub, Nemotron, MLflow) degrades gracefully to local fallbacks.

Challenges we ran into

Getting Claude to return reliably parseable JSON for structured code reviews required serious prompt engineering — we built a complete fallback chain (JSON extraction → regex parsing → template defaults) so the system never crashes on malformed AI output. The swarm coordination was the hardest design problem: figuring out dependency ordering (FixGenerator needs Reviewer's output first, but TestWriter and SecurityAuditor can run in parallel) and making it actually work with asyncio.gather while tracking tokens and costs per-agent in real-time across concurrent executions.

Accomplishments that we're proud of

We built a fully functional AI agent swarm where six specialized agents coordinate with dependency-aware parallel execution — the Coordinator reads all issues and plans the work, agents like FixGenerator and TestWriter run simultaneously where possible, and the entire pipeline from "connect a GitHub repo" to "PR created with fixes" works end-to-end in a single session with real-time progress visible in the dashboard. The architecture's graceful degradation is the other thing we're proud of: kill every external API key and the dashboard still loads, PRs still get heuristic risk scores, and translation falls back to regex templates — we designed for resilience from hour zero, not as an afterthought.

What we learned

Schema-first development is everything — freezing all 15 data contracts before writing a single line of implementation let four engineers work independently for days without a merge conflict on interfaces. We also learned that the real product differentiator isn't the AI analysis itself (anyone can call Claude) — it's the closed loop from "issue found" to "fix applied" in one click with cost tracking and progress visibility that humans actually trust enough to let agents ship code.

What's next for Agent HQ

Multi-repo swarm operations where agents understand cross-service dependencies and coordinate fixes across an entire microservices architecture simultaneously. A learning loop where successful fix patterns get embedded as reusable "skill recipes" — so the hundredth null-check fix costs near-zero tokens and executes instantly. And an open agent marketplace where teams publish and share specialized agents (compliance auditor, migration assistant, performance profiler) that plug directly into the swarm.

Built With

claude
databricks
mlflow
python
typescript

Submitted to

2026 Startup Week Buildathon

Created by

Built a translation layer with Nemotron API that turns raw terminal output into clear, plain-English updates. Added telemetry with MLflow and Databricks to track performance, cost, and agent health. Integrated GitHub to analyze PR risk, test coverage, and repo health, and created a recommendation engine that turns insights directly into new agent tasks.

Shachaf Rispler
Designed the end-to-end system architecture and defined all data contracts that serve as the single source of truth across backend and frontend. Built the core orchestrator for headless agent spawning and task lifecycle management, the WebSocket infrastructure for real-time updates, and the complete mock data layer that enabled the team to develop in parallel without blocking. Led the v2 architectural pivot replacing the Nia MCP dependency with Claude API, designed the Swarm Orchestrator with dependency-aware parallel execution, built the repository manager for on-demand GitHub analysis, created the integration test suite, and managed sprint execution across all work streams.

Ayush Verma
Integrated Nia MCP for repository-wide indexing with a Claude API fallback to maintain continuous architectural awareness. Built a TF-IDF–based Skill Synthesis engine to reuse successful multi-agent workflows and a PDF-powered Knowledge Base for injecting business context into prompts. Implemented guardrails including real-time file monitoring, automated lint/security orchestration, and a 3-Strike escalation system for self-correcting agent execution.

Madhav Tibrewal
Built the complete Next.js 14 frontend (App Router, Tailwind, shadcn/ui) including real-time WebSocket-driven activity streams, swarm task monitoring, safety approval flows, budget controls, and interactive dashboards (PR Radar, Coverage Map, Repo Health, FinOps), along with a custom useWebSocket hook and typed useAPI layer for reliable frontend–backend communication.

Arya Rohit Shidore