🧭 NEXUS — Semantic Codebase Cartographer

💡 Inspiration

Every developer has faced it: you inherit a codebase of thousands of lines, no documentation, and a deadline. Reading file by file is like navigating a city without a map. We asked ourselves — what if the map already existed?

That question became NEXUS. Inspired by how GPS transformed navigation, we wanted to build the equivalent for source code: a tool that doesn't just show you the files, but the relationships between them, letting you understand a project in minutes instead of days.


🔨 How We Built It

NEXUS is split into two independent services:

Backend — The Analytical Engine (Python + FastAPI) We use Python's native ast module to walk the Abstract Syntax Tree of every file in the uploaded repository, extracting classes, functions, call relationships and inheritance chains. All nodes are then embedded and stored in ChromaDB using OpenAI embeddings, powering a full RAG pipeline built with LangChain for contextual AI queries.

Frontend — The Visual Cartographer (Next.js 14 + TypeScript) The graph is rendered with React Flow, with custom node components color-coded by type: file, class and function. Global state is managed with Zustand, keeping the selected node, chat history and highlighted nodes perfectly in sync across the canvas and sidebar.


📚 What We Learned

  • How to traverse and extract meaningful structure from Python ASTs at scale
  • How to design a RAG pipeline that is genuinely contextual, retrieving the semantically closest nodes before generating any response
  • That graph layout algorithms are deceptively hard to get right
  • How to keep two independent microservices cleanly connected through strictly typed Pydantic and TypeScript contracts

🚧 Challenges We Faced

AST edge filtering was the biggest technical hurdle. The parser generates thousands of potential edges, but many point to nodes outside the repository, standard library calls, third-party imports and so on. We had to validate every edge against the actual node set before rendering.

RAG context management was another challenge. Sending entire files to the LLM exceeds token limits quickly. We solved this by relying on ChromaDB similarity search to inject only the most relevant nodes as context, keeping every prompt efficient and on-topic.

React Flow performance degraded noticeably on large repositories. We solved this with memoization on both nodes and edges, ensuring the graph only re-renders when the underlying data actually changes.


🚀 What's Next for NEXUS

  • Support for JavaScript and TypeScript via Babel AST
  • Direct GitHub URL ingestion without needing a ZIP file
  • A native JetBrains plugin that runs NEXUS inline inside the IDE
  • Diff mode: visualize the graph before and after a refactoring to understand structural impact before touching production code

Built With

Share this project:

Updates