REPO_MCP

Ingestion Page
Ai Assistant

Inspiration

The idea for Repo-MCP was born out of a common frustration: "Read the code" is easier said than done.

What it does

Repo-MCP transforms static GitHub repositories into interactive, queryable knowledge bases. It ingests codebases—parsing languages like Python, TypeScript, and Markdown—and indexes them into a vector database.

Users can then simply chat with the repository. Ask questions like "Explain the authentication middleware" or "Where is the API rate limit defined?", and Repo-MCP retrieves the exact code snippets and context needed to provide a detailed, accurate answer. It essentially gives you a "conversation" with any codebase.

How we built it

We built Repo-MCP as a modular, full-stack Python application, prioritizing a seamless developer experience ("DevX").

The Stack

Frontend: We chose NiceGUI for its ability to build modern, reactive web interfaces purely in Python. It allowed us to move fast without context-switching to JavaScript frameworks.
RAG Engine: LlamaIndex powers our core logic, handling the complexity of chunking code and orchestration.
Vector Store: MongoDB Atlas serves as our vector database, allowing us to store high-dimensional embeddings alongside rich file metadata.
LLM: We utilize Gemma 2.0-9b for its strong reasoning capabilities on code tasks.

Architecture

The system operates in two main phases:

Ingestion: We traverse the GitHub file tree, filtering for relevant source files (.py, .ts, .md etc.). These are parsed and split into chunks.
Retrieval: When a user asks a question, we generate an embedding vector for the query q and compare it against our document vectors d_i.

We use Cosine Similarity to find the most relevant code chunks:

$$ \text{similarity}(\mathbf{q}, \mathbf{d}) = \frac{\mathbf{q} \cdot \mathbf{d}}{|\mathbf{q}| |\mathbf{d}|} = \frac{\sum_{i=1}^{n} q_i d_i}{\sqrt{\sum_{i=1}^{n} q_i^2} \sqrt{\sum_{i=1}^{n} d_i^2}} $$

These top-k chunks are then fed into the LLM context to generate a precise answer.

Challenges we ran into

Context Window Limits: Code is verbose. Early on, we struggled with the LLM losing specific details when the retrieved context was too large. We had to refine our chunking strategy to respect function boundaries rather than just splitting by character count.
Latency vs. Accuracy: Generating embeddings for an entire repository takes time. We implemented asynchronous processing jobs using Python's asyncio to keep the UI responsive while heavy ingestion happened in the background.
File Filtering: Not all files are useful. We had to build robust filters to exclude generated files, lock files, and assets to prevent "noise" in our vector search results.

Accomplishments that we're proud of

Full-Stack Python: We built a complete, modern web application without touching a single line of JavaScript, thanks to NiceGUI.
Real-time Ingestion: Successfully implementing an async ingestion pipeline that can handle large repos without freezing the UI.
Context-Aware RAG: Fine-tuning our chunking and retrieval strategies to handle code syntax, ensuring the LLM gets meaningful context rather than fragmented lines.
Clean UI/UX: Designing a developer-centric interface that feels native and responsive, complete with dark mode and smooth transitions.

What we learned

Building Repo-MCP taught us that code is a unique challenge for RAG. Unlike prose, code depends heavily on structure and references. A single line of code is often meaningless without its imports or class definition.

We also learned the importance of hybrid search. While vector search captures semantic meaning, sometimes you really do need an exact keyword match for a variable name. Balancing these two approaches is key to a robust developer tool.

Ultimately, we learned that the future of coding isn't just about writing code faster—it's about understanding it deeper.

What's next for REPO_MCP

IDE Integration: Building a VS Code extension to bring Repo-MCP directly into the editor.
Multi-Repo Support: Allowing users to query across multiple microservices simultaneously to understand system-wide interactions.
Local LLM Support: Adding support for Ollama to run the entire stack locally for maximum privacy.
Smart "Code Walkthroughs": Generative interactive guides that walk new developers through the core logic of a repo step-by-step.