Inspiration

Cursor agents repeat the same codebase context fetches: same intent, slightly different wording, same expensive round trip. Every redundant call costs money and slows the loop. We wanted a layer that sits invisibly in between and absorbs the repetition.

What it does

We intercept every lookup_codebase_context call from Cursor at an MCP gateway. For each query we:

  • Ask a semantic cache whether we already have a good answer.
  • On a strong match, return in milliseconds (your “under 5 ms” target on hot paths when the cache and embedder are warm).
  • On near-misses—high embedding similarity but possibly wrong intent—we run a CLōD verifier (yes/no on whether the cached context still answers the new query). That catches “authentication” vs “admin authentication” without blindly serving the wrong snippet.
  • On miss or reject, we forward to Nia, insert the result, and keep the dashboard honest. A Cursor extension subscribes to a WebSocket from the gateway and shows hits, misses, rejects, and savings in real time so the behavior isn’t a black box.

How we built it

TypeScript MCP server in front, Python FastAPI cache behind it, and Clod for verification. A Cursor extension with a live WebSocket dashboard showing hits, misses, and savings in real time.

Challenges we ran into

The verifier threshold tuning was harder than expected for example, a query about "authentication" and "admin authentication" looks similar but need different contexts. Getting responses without killing the hit rate took most of our time.

Accomplishments we're proud of

  • Being able to use new tools like Clod and Nia, caching at two layers simultaneously without either knowing about the other.
  • Observable system: WebSocket + extension makes hit rate and savings something you can see during a demo, not infer from logs alone.
  • Contracts-first hackathon glue: ports, payloads, and statuses shared via CONTRACTS.md so three runtimes could integrate without endless coordination.

What we learnt

  • Treat threshold + verifier tuning as a first-class workstream, not a polish step at the end.
  • Embedding similarity ≠ user intent; the verifier is the difference between “clever cache” and “wrong answers fast.”

What's next for CacheForge

Future developments for CacheForge would be centered on using it in a team environment. Including a shared cache, multi-repo support, and a cost dashboard over time.

Built With

Share this project:

Updates