CodeWhisper Forge 🚀

Outcome-first MCP for developers — predicts fixes that actually work, and can ship them as a PR.

Inspiration

Most AI dev tools stop at “here are some links.” In a hackathon, that’s noise. We wanted signal: a system that tells you which fix is most likely to work—and then implements it.
Insight: Accepted Stack Overflow answers + closed GitHub issues are weak labels we can mine to estimate solution success rates. Pair that with an auto-PR flow and you’ve got outcomes, not bookmarks.

What it does

🎯 Issue Resolution Predictor (IRP):
Mines similar SO/GitHub cases, clusters solutions (e.g., Use AbortController, Limit concurrency, Disable keep-alive), and ranks them by observed success rate with evidence links.

“Use AbortController — 82% success (18/22) · 3 sources”
🤖 Auto-Fix Generator (AFG):
Scans your repo for real call sites (e.g., fetch(), drafts full-file patches (Gemini), and can open a PR (dry-run by default).

“Found 7 uses of fetch. Proposed safe timeouts + cleanup. PR ready.”
🧠 Smart Context Pack:
Pulls repo signals (relevant issues, fresh commits, related trending repos) + top SO threads, compresses into a token-budgeted context for the LLM.
⚡ Minimal UI (Streamlit):
One form → AI answer, prediction table, repo context, SO links, and a downloadable ZIP of patches.

How we built it

MCP Server (Node/Express):
- Intent analysis: Gemini 2.5 Flash → HF Qwen2.5-Coder-7B-Instruct → regex fallback.
- Fetchers: GitHub REST (issues/commits/search/code), Stack Exchange API.
- IRP engine: Label solutions (Gemini or heuristic), aggregate success via accepted answers + closed issues, rank by success_rate and show evidence.
- AFG engine: GitHub code search → fetch contents → LLM generates full-file patches → optional branch + auto-PR.
- Safety: Dry-run by default, minimal diff, links to evidence, explicit user opt-in to push.
Frontend (Streamlit):
Calls /mcp, renders AI response, predictions, GitHub context, SO links, and Auto-Fix plan/PR.
Pragmatics: Caching, timeouts, graceful degradation when API keys are missing, strict token budgets.

Challenges we ran into

Weak labels ≠ causality: Accepted answers + closed issues are proxies. We mitigate via confidence badges, evidence links, and user override.
GitHub search drift: Generic queries return noisy repos; we added language scoping and repo-aware terms.
LLM patch reliability: We request full-file outputs, enforce idempotency, and keep dry-run as the default.
Rate limits: Implemented caching + conservative parallelism; authenticated GitHub boosts limits.

Accomplishments that we're proud of

Turned RAG into decisioning: not just “what people tried,” but which fix worked and how often.
The last mile: one click from advice to ready PR.
A clean, demoable flow: query → predictions → patch preview → (optional) PR link. Judges can follow along in seconds.

What we learned

Outcome-driven UX beats generic chat: devs care about probability of success and time to merge.
LLMs thrive with tight, structured context (ranked, deduped, token-capped) rather than raw dump.
Guardrails matter: dry-run first, show diffs, require explicit consent to write.

What’s next for CodeWhisper Forge

Private sources: Notion/Confluence/Linear connectors for internal fixes & decisions.
Deeper telemetry: Learn from your repo history to personalize success priors.
Langchain of tools → plans: Multi-step fixes (tests, perf dashboards, canary toggles).
Marketplace of fix recipes: Community-curated solution labels mapped to common failure modes.
IDE plugin: Inline predictions + one-click patch in VS Code/JetBrains.

Why this wins: We don’t just fetch context—we predict, prioritize, and ship. It’s a wow moment: “Here’s the fix, and here’s your PR.”