🧠 About the Project

πŸš€ Inspiration

As developers, we've all faced the pain of onboarding into unfamiliar or legacy codebases - slow, confusing, and time-consuming. We asked ourselves: What if a GitHub repo could explain itself?

That’s how ungithub was born, an AI-powered codebase insight explorer that acts like a smart mentor embedded into any open-source project.


πŸ—οΈ How We Built It

We broke the project into 6 clear phases:

  1. Planning & Setup
    I scaffolded the backend (FastAPI) and optionally the frontend (Next.js), and connected to MongoDB Atlas with vector search enabled.

  2. Repo Cloning & Parsing
    Using git and GitHub API, I cloned public repositories and recursively parsed files β€” filtering out noise (node_modules, .git, etc.) and storing structured metadata.

  3. Embeddings + Vector Search
    Code chunks were vectorized using OpenAI or SentenceTransformers, then stored in MongoDB Atlas with a vector index for similarity search.

  4. Natural Language QA
    I built an endpoint that accepts user questions, performs a vector search, and feeds the most relevant chunks to an LLM to generate accurate answers β€” with file-level citations.

  5. Frontend (optional)
    A clean Next.js UI lets users paste a GitHub link, wait for indexing, and ask questions about the repo. Results are displayed in real time.

  6. Polish & Deploy
    In the end, deployed the backend to Render and frontend to Vercel, added error handling, and optimized for minimal latency and good UX.


🧠 What I Learned

  • MongoDB Atlas Vector Search is powerful and beginner-friendly for semantic search.
  • LangChain + OpenAI/Gemini is a potent combo for contextual code summarization and Q&A.
  • Optimizing cold starts, managing large repositories, and chunking logic was trickier than expected.
  • Good UX is critical, even for technical tools, so I kept interactions minimal: paste, wait, ask.

🧱 Tech Stack

Layer Tools
Backend Python, FastAPI
Embedding & AI OpenAI, Gemini, LangChain, SentenceTransformers
Vector DB MongoDB Atlas
Frontend Next.js, Tailwind CSS
Hosting Vercel (frontend), Render/Fly.io (backend)
Repo Cloning GitHub API, git

πŸ§— Challenges I Faced

  • Handling large monorepos efficiently (chunking, memory usage)
  • Dealing with cold backend starts due to free-tier hosting
  • Creating a generic chunking algorithm that works across multiple languages
  • Ensuring relevant vector search β€” avoiding false positives
  • Avoiding token overflows when passing large context to LLMs

πŸ’‘ Try it Yourself

πŸ”— Paste any repo at ungithub.vercel.app
⏳ Small repos index in 3–4 minutes, large ones take 5–7.
πŸ€– Once indexed, ask your questions and get answers in seconds!


πŸ™Œ Built For

AI in Action Hackathon β€” hosted by Google x MongoDB x GitLab

Built With

Share this project:

Updates