About the Project

Inspiration

In the age of AI-driven development, I realized a fundamental disconnect: while Large Language Models are incredibly powerful, they often lack the deep, contextual understanding of a specific codebase. Developers and AI agents alike spend too much time manually searching for code, trying to understand dependencies, and piecing together the architecture of large projects. I was inspired to build a tool that treats a codebase not as a collection of static files, but as a living, queryable knowledge base—effectively giving every repository its own dedicated expert.

What it does

CodeVault is a full-stack platform that transforms any code repository into an intelligent, searchable system accessible via a clean REST API. A developer can upload their entire codebase (via Git URL or a .zip file), and CodeVault kicks off a multi-stage background analysis. It parses the structure of every file, identifies key code constructs like functions and classes, and creates semantic vector embeddings to understand the meaning of the code.

Once the analysis is complete, the repository becomes a queryable API endpoint. Developers can perform natural language searches (e.g., "find all user authentication functions") and get back contextually relevant code snippets, making it a powerful tool for code discovery, onboarding, and integration with AI workflows like OpenAI's function calling or Claude's tool use.

How I built it

CodeVault is a full-stack monorepo application built with a modern, scalable architecture.

  • Frontend: A responsive and real-time interface built with React, Vite, and Tailwind CSS. It uses WebSockets for live progress updates and a robust API client to communicate with the backend. The frontend is deployed on Netlify.

  • Backend: A powerful Node.js application using Express.js for the API. The core of the backend is a scalable job processing system built with BullMQ and Redis. This allows the system to handle long-running analysis tasks asynchronously without blocking the user.

  • Analysis Engine:

    • Tree-sitter: For precise, language-aware Abstract Syntax Tree (AST) parsing, which allows me to identify the exact structure and location of functions, classes, and components.
    • OpenAI Embeddings: I used the text-embedding-3-small model to create vector representations of code snippets, capturing their semantic meaning.
  • Database: I use PostgreSQL supercharged with the pgvector extension. This enables lightning-fast and highly relevant similarity searches on the code embeddings, forming the core of the semantic search functionality.

  • Infrastructure: The entire backend—including the API server, Redis queue, and multiple background workers—is containerized and deployed on Railway. This provides a robust, multi-service environment where each component can scale independently.

Challenges I ran into

Deploying a multi-service, full-stack application from a local Docker environment to a production cloud platform was a marathon of debugging. I faced every classic deployment challenge:

  • Dependency Conflicts: tree-sitter had version-breaking changes that required careful dependency management.
  • Infrastructure Configuration: The initial PostgreSQL instance on Railway didn't have the required pgvector extension, forcing me to re-provision the database.
  • Environment Variable Hell: I spent significant time debugging why the production environment variables weren't being injected, tracing the issue from missing UI settings to stubborn build caches on Netlify.
  • Startup Race Conditions: The application was so fast that it would try to connect to the network before the internal DNS was ready, causing ENOTFOUND errors. I had to make the WebSocket and Redis connections more resilient.
  • Procfile vs. Start Command: I fought a long battle with the deployment platform to get it to recognize the Procfile and run all three of the backend processes (web, worker, health-worker), eventually landing on a robust solution using concurrently in the package.json.

Accomplishments that I am proud of

First and foremost, I built a fully working, deployed, full-stack application that solves a real-world problem. The backend isn't just a simple server; it's a resilient, multi-process architecture with a job queue, something I am incredibly proud of.

I am also proud of the deep analysis engine. Integrating Tree-sitter for precise AST parsing with OpenAI for semantic meaning provides a two-layered understanding of code that is far more powerful than simple text search. Finally, successfully deploying this complex system across two different platforms (Railway and Netlify) and getting them to communicate flawlessly was a massive accomplishment.

What I learned

This project was a masterclass in the difference between "it works on my machine" and "it works in production." I learned the critical importance of:

  • Orchestrating a Distributed System: Deploying a single app is one challenge; orchestrating a constellation of services (a frontend, an API server, and multiple background workers) across different platforms (Netlify and Railway) is another entirely. I learned to think about the system holistically, debugging complex race conditions and networking issues that only appear when all the pieces are running together. This was a masterclass in managing inter-service communication and building a cohesive system from independent, moving parts.

  • Robust Startup Logic: Production services don't always start in a predictable order. Applications must be built to be resilient, with retry logic for database and network connections.

  • The Power of Logs: Every single problem I solved was because I was able to get clear, actionable error messages from the logs.

  • From Impossibility to Reality in 24 Hours: This project was a testament to pure endurance and focus. I learned that what often feels like an impossible timeline is actually a high-intensity sprint that pushes your skills to their absolute limit. While it was a tiring marathon of coding and debugging, the feeling of deploying a complete, working full-stack application in under a day is an incredible accomplishment that redefines what you think is achievable.

What's next for CodeVault

The foundation I have built is incredibly powerful, and I am just getting started.

  • Deeper Integrations: I plan to build official client libraries to make it even easier to integrate CodeVault as a tool for OpenAI GPTs and Claude models.
  • Relationship Mapping: The backend already collects data on dependencies between code constructs. The next step is to visualize this as an interactive dependency graph in the UI.
  • Direct GitHub Integration: Allowing users to install a CodeVault GitHub App to analyze private repositories and keep them in sync automatically is my top priority for user features.
  • Advanced Queries: I want to expand the query capabilities to support more complex questions, like "show me all functions that are not covered by a test" or "find all components that use this deprecated function."

Built With

Share this project:

Updates