GitHub Graph RAG Agent

๐Ÿš€ Inspiration

Traditional code search methods often struggle with retrieving relevant functions across large repositories. Existing solutions lack structural understanding of repositories, making it difficult to retrieve code based on contextual relationships. Inspired by Graph RAG, I aimed to build an intelligent code retrieval system that understands file structures, function dependencies, and embeddings to enhance query-based retrieval.

๐Ÿ” What It Does

The GitHub Graph RAG Agent is a repository-aware AI assistant that:

Fetches repository contents directly via the GitHub REST API (without cloning). Parses directories and extracts functions from various languages (.py, .js, .cpp, .ts, .sol, etc.). Stores repository structure and function relationships in Neo4j for graph-based retrieval. Generates function descriptions using LangChain & Groq. Creates embeddings with MiniLM-L6-v2 for semantic similarity search. Processes user queries using LangChain, retrieving the most relevant function based on structure and meaning. Visualizes repository structure and code relationships as an interactive graph. ๐Ÿ› ๏ธ How I Built It

Backend: Built using FastAPI to handle requests efficiently. Graph Database: Utilized Neo4j to store file-function relationships and enable fast graph traversal. LLM Processing: Integrated LangChain to process function descriptions and enhance retrieval. Embeddings: Used MiniLM-L6-v2 for function similarity search. GitHub API: Implemented REST API-based repo fetching instead of cloning. Graph Visualization: Designed an interactive graph to map directories, files, and functions. ๐Ÿšง Challenges I Ran Into

Parsing diverse programming languages: Developed custom function extraction logic for different languages. Efficiently handling large repositories: Optimized batch processing to minimize API latency. Designing an optimal Neo4j schema: Ensured quick traversal and retrieval of function relationships. Fine-tuning similarity search: Optimized embeddings for high-accuracy function retrieval. ๐Ÿ† Accomplishments That I'm Proud Of

Successfully automated multi-language function extraction. Built an optimized Graph RAG pipeline for structured repository retrieval. Removed the need for cloning, making processing faster and more efficient. Achieved accurate and fast function retrieval using Neo4j and embeddings. Developed a scalable FastAPI-based API for easy integration. ๐Ÿ“š What I Learned

How to fetch and process repositories via GitHubโ€™s REST API. The power of Graph RAG for structured knowledge retrieval. How to optimize function embeddings and graph traversal for better search accuracy. Best practices for building scalable AI-powered FastAPI services. ๐Ÿ”ฎ What's Next for GitHub Graph RAG Agent?

Enhance support for more programming languages. Improve function similarity scoring with a fine-tuned embedding model. Add a frontend UI for interactive graph-based exploration. Deploy the API as a service for broader developer adoption. Integrate deeper reasoning capabilities for better function explanations.

Built With

Share this project:

Updates