π§ About the Project
π Inspiration
As developers, we've all faced the pain of onboarding into unfamiliar or legacy codebases - slow, confusing, and time-consuming. We asked ourselves: What if a GitHub repo could explain itself?
Thatβs how ungithub was born, an AI-powered codebase insight explorer that acts like a smart mentor embedded into any open-source project.
ποΈ How We Built It
We broke the project into 6 clear phases:
Planning & Setup
I scaffolded the backend (FastAPI) and optionally the frontend (Next.js), and connected to MongoDB Atlas with vector search enabled.Repo Cloning & Parsing
Usinggitand GitHub API, I cloned public repositories and recursively parsed files β filtering out noise (node_modules,.git, etc.) and storing structured metadata.Embeddings + Vector Search
Code chunks were vectorized using OpenAI or SentenceTransformers, then stored in MongoDB Atlas with a vector index for similarity search.Natural Language QA
I built an endpoint that accepts user questions, performs a vector search, and feeds the most relevant chunks to an LLM to generate accurate answers β with file-level citations.Frontend (optional)
A clean Next.js UI lets users paste a GitHub link, wait for indexing, and ask questions about the repo. Results are displayed in real time.Polish & Deploy
In the end, deployed the backend to Render and frontend to Vercel, added error handling, and optimized for minimal latency and good UX.
π§ What I Learned
- MongoDB Atlas Vector Search is powerful and beginner-friendly for semantic search.
- LangChain + OpenAI/Gemini is a potent combo for contextual code summarization and Q&A.
- Optimizing cold starts, managing large repositories, and chunking logic was trickier than expected.
- Good UX is critical, even for technical tools, so I kept interactions minimal: paste, wait, ask.
π§± Tech Stack
| Layer | Tools |
|---|---|
| Backend | Python, FastAPI |
| Embedding & AI | OpenAI, Gemini, LangChain, SentenceTransformers |
| Vector DB | MongoDB Atlas |
| Frontend | Next.js, Tailwind CSS |
| Hosting | Vercel (frontend), Render/Fly.io (backend) |
| Repo Cloning | GitHub API, git |
π§ Challenges I Faced
- Handling large monorepos efficiently (chunking, memory usage)
- Dealing with cold backend starts due to free-tier hosting
- Creating a generic chunking algorithm that works across multiple languages
- Ensuring relevant vector search β avoiding false positives
- Avoiding token overflows when passing large context to LLMs
π‘ Try it Yourself
π Paste any repo at ungithub.vercel.app
β³ Small repos index in 3β4 minutes, large ones take 5β7.
π€ Once indexed, ask your questions and get answers in seconds!
π Built For
AI in Action Hackathon β hosted by Google x MongoDB x GitLab
Log in or sign up for Devpost to join the conversation.