About Me
I’m Gowtham Kishore, a master’s student in Computer and Information Science.
Inspiration
I recently read about Agentic RAG, but I didn’t know where to start. When I saw this hackathon, I thought it would be a great chance to learn by building—using Agentic Graph RAG along with NetworkX algorithms.
As a student, I often search for GitHub repositories based on owners, topics, and languages to explore new codebases. But with so many repos out there, it’s easy to get lost and struggle to find the right one. A simple Google search doesn’t always help because my queries are more complex—like finding the most-starred repos using a specific language or similar to another repo.
When I found this hackathon, I thought—why not solve my own problem? I came across this dataset: GitHub Repository Metadata (5+ Stars), which has 3.2 million repositories and their metadata. Now, I’m excited to build something useful!
How We Built It
- Used a Kaggle dataset that is updated quarterly.
- Set up an ArangoDB cluster for storage.
- Built the UI with Gradio.
- Integrated NetworkX for running graph algorithms.
- Used LangChain and LangGraph to create an Agentic RAG for storing memory.
Challenges We Ran Into
- Learning Python: Python wasn't my go-to language, so I had to learn a lot while developing the project.
- Understanding New Concepts: Since I was unfamiliar with Agentic RAG and NetworkX, I had to go through several demos and tutorials to understand how they worked.
- Handling Large Data: When pushing NetworkX nodes to ArangoDB, I ran into an issue with the _key and _id constraints, especially since the dataset was around 3GB. Adding attributes like repo name and owner name caused a lot of overhead. I didn’t realize this right away, so I had to find a way to override key and edge generation to prevent that.
- Solution: After going through the ArangoDB documentation, I found a way to override key generation and edge creation, which improved my computation time significantly.
What We Learned
Personal Growth: This was my first hackathon, and working on building an Agentic RAG allowed me to learn and grow. I’m proud of how I tackled the challenges along the way, especially the ones related to data storage and graph algorithms. I spent a lot of time building this, and it felt incredibly rewarding to make something productive.
Python: Although Python wasn’t my primary language, this project helped me learn a lot. I’m now considering it as my go-to language moving forward.
Agentic RAG: Before, I’d read about Agentic RAG in articles but didn’t know how to build one. Now, I feel confident that I can create an Agentic RAG from scratch.
New Technologies: I got hands-on experience with several new technologies like ArangoDB, NetworkX, CuGraph, iJSON module, Gradio, LangChain, and LangGraph—all of which were new to me.
Future Exploration: I believe there's a lot more to explore in the field of Agentic applications. I’m excited to dive deeper into Agentic RAG and plan to use ArangoDB for future projects in this space.
What's Next for Personal Octorag
Add Missing Information: Currently, I skipped loading some information into the graph due to its size. I plan to add this data to enhance the functionality and make the graph more comprehensive.
Cron Job for Updates: I aim to set up a cron job to automatically fetch newly created repositories and keep the dataset up to date.
Serve Media Content: Another improvement I’m considering is to serve different types of media content, such as files, to make the platform more versatile and user-friendly.
Implement ElasticSearch: I’m also thinking about integrating ElasticSearch with ArangoDB to handle fuzzy queries, improving search functionality and making it more efficient for complex queries.
Built With
- arangodb
- gradio
- langchain
- langgraph
- networkx
- python
Log in or sign up for Devpost to join the conversation.