CodeWhisper Forge 🚀

Outcome-first MCP for developers — predicts fixes that actually work, and can ship them as a PR.


Inspiration

Most AI dev tools stop at “here are some links.” In a hackathon, that’s noise. We wanted signal: a system that tells you which fix is most likely to work—and then implements it.
Insight: Accepted Stack Overflow answers + closed GitHub issues are weak labels we can mine to estimate solution success rates. Pair that with an auto-PR flow and you’ve got outcomes, not bookmarks.


What it does

  • 🎯 Issue Resolution Predictor (IRP):
    Mines similar SO/GitHub cases, clusters solutions (e.g., Use AbortController, Limit concurrency, Disable keep-alive), and ranks them by observed success rate with evidence links.

    “Use AbortController82% success (18/22) · 3 sources”

  • 🤖 Auto-Fix Generator (AFG):
    Scans your repo for real call sites (e.g., fetch(), drafts full-file patches (Gemini), and can open a PR (dry-run by default).

    “Found 7 uses of fetch. Proposed safe timeouts + cleanup. PR ready.”

  • 🧠 Smart Context Pack:
    Pulls repo signals (relevant issues, fresh commits, related trending repos) + top SO threads, compresses into a token-budgeted context for the LLM.

  • ⚡ Minimal UI (Streamlit):
    One form → AI answer, prediction table, repo context, SO links, and a downloadable ZIP of patches.


How we built it

  • MCP Server (Node/Express):

    • Intent analysis: Gemini 2.5 Flash → HF Qwen2.5-Coder-7B-Instruct → regex fallback.
    • Fetchers: GitHub REST (issues/commits/search/code), Stack Exchange API.
    • IRP engine: Label solutions (Gemini or heuristic), aggregate success via accepted answers + closed issues, rank by success_rate and show evidence.
    • AFG engine: GitHub code search → fetch contents → LLM generates full-file patches → optional branch + auto-PR.
    • Safety: Dry-run by default, minimal diff, links to evidence, explicit user opt-in to push.
  • Frontend (Streamlit):
    Calls /mcp, renders AI response, predictions, GitHub context, SO links, and Auto-Fix plan/PR.

  • Pragmatics: Caching, timeouts, graceful degradation when API keys are missing, strict token budgets.


Challenges we ran into

  • Weak labels ≠ causality: Accepted answers + closed issues are proxies. We mitigate via confidence badges, evidence links, and user override.
  • GitHub search drift: Generic queries return noisy repos; we added language scoping and repo-aware terms.
  • LLM patch reliability: We request full-file outputs, enforce idempotency, and keep dry-run as the default.
  • Rate limits: Implemented caching + conservative parallelism; authenticated GitHub boosts limits.

Accomplishments that we're proud of

  • Turned RAG into decisioning: not just “what people tried,” but which fix worked and how often.
  • The last mile: one click from advice to ready PR.
  • A clean, demoable flow: query → predictions → patch preview → (optional) PR link. Judges can follow along in seconds.

What we learned

  • Outcome-driven UX beats generic chat: devs care about probability of success and time to merge.
  • LLMs thrive with tight, structured context (ranked, deduped, token-capped) rather than raw dump.
  • Guardrails matter: dry-run first, show diffs, require explicit consent to write.

What’s next for CodeWhisper Forge

  • Private sources: Notion/Confluence/Linear connectors for internal fixes & decisions.
  • Deeper telemetry: Learn from your repo history to personalize success priors.
  • Langchain of tools → plans: Multi-step fixes (tests, perf dashboards, canary toggles).
  • Marketplace of fix recipes: Community-curated solution labels mapped to common failure modes.
  • IDE plugin: Inline predictions + one-click patch in VS Code/JetBrains.

Why this wins: We don’t just fetch context—we predict, prioritize, and ship. It’s a wow moment: “Here’s the fix, and here’s your PR.”

Built With

Share this project:

Updates