Inspiration

Understanding large codebases is one of the biggest pain points for developers. Whether it's onboarding to a new project or revisiting an old one, navigating thousands of files without context is slow and frustrating.

We wanted to build something that acts like an AI-powered guide for any repository, a system that not only reads code but truly understands its structure and relationships.

What it does

RepoLens is an AI-powered repository intelligence platform that helps developers explore, understand, and interact with codebases efficiently.

It provides:

  • Semantic code search using vector embeddings

  • Context-aware chat for explaining code

  • AST-based structural parsing for deep code understanding

  • Visualization of file relationships and dependencies

  • Auto-generated documentation from real code context

How we built it

We designed RepoLens as a distributed, full-stack AI system:

  • Frontend: Next.js for interactive UI and visualization

  • Backend: Node.js + Express for orchestration and APIs

  • AI Service: FastAPI for embedding generation and RAG pipeline

  • Parsing Engine: Tree-sitter for AST-based code analysis

  • Vector DB: PostgreSQL with pgvector for semantic search

  • Queue System: BullMQ for scalable background processing

The pipeline:

  1. Repository ingestion

  2. AST parsing and chunking

  3. Embedding generation

  4. Storage in vector database

  5. Retrieval + LLM-based response generation

Challenges we ran into

  • Efficiently parsing large repositories without performance bottlenecks

  • Maintaining context across multiple files during retrieval

  • Reducing noise in vector search results

  • Designing a reliable RAG pipeline for accurate answers

  • Handling async processing with queues and workers

Accomplishments that we're proud of

  • Built a full end-to-end AI system from scratch

  • Achieved meaningful code understanding beyond simple text search

  • Implemented scalable ingestion using message queues

  • Combined AST parsing with embeddings for better accuracy

  • Enabled natural language interaction with codebases

What we learned

  1. How to design and optimize RAG pipelines for real-world systems

  2. Trade-offs between semantic search vs structural parsing

  3. Importance of chunking strategy in embeddings

  4. Building scalable distributed systems with queues

  5. Integrating AI into developer workflows effectively

What's next for RepoLens

What's next for RepoLens

  1. Real-time repository syncing
  2. Smarter code reasoning (multi-file understanding)
  3. Advanced visualizations (call graphs, architecture maps)

Built With

Share this project:

Updates