Semantic Search Engine for Code Repositories

🧠 Inspiration

Our experiences with open-source programming inspired this project. I often wished for a semantic way to search for specific code snippets, files, or features. This RAG tool is a direct response to that challenge. It gives developers a way to query an LLM about their entire repository of code directly, making it easy for developers to find what they need quickly and accurately.

💻 What it does

First, the user inputs a GitHub link, which we parse using LlamaIndex's GithubRepositoryReader. Next, we use Jina AI's code embeddings API to create embeddings for the read files and store them in an index. After the index is made, the user can query about the code, and the RAG will respond with the knowledge the codebase gives.

🔨 How we built it

We began our backend process using LlamaIndex's framework; we utilized a couple of Llamaindex's libraries, including TiDBVectorStore and VectorStoreIndex. This framework facilitates our general process of parsing the repository, generating embeddings, forming the index, creating a query engine, and returning a response. And for the frontend, we built it on top of streamlit framework which allows you to host the application.