Codebase Autopsy Agent

Inspiration

Every developer knows the feeling of being dropped into a massive, unfamiliar codebase. When I was a new Software Engineering Intern, I was thrown into a huge company repository, and it took me weeks to not feel overwhelmed. I spent more time trying to understand the existing workflow and decipher what each piece of code did than actually building new features. The learning curve was steep, and the process was isolating.

I created the Codebase Autopsy Agent because I don't want any developer to get boggled down for weeks just trying to find their footing. I wanted to build the tool I wish I had on day one: an AI partner that has already read the entire codebase and is ready to answer any question, diagnose any bug, and help you become a productive team member from the very beginning.

What it does

The Codebase Autopsy Agent is a multi-talented AI assistant that transforms a static code repository into an interactive, intelligent partner. It allows any developer to:

💻 Ingest Any Public Repository: Simply provide a GitHub URL, and the agent securely clones the repository, breaks down the code, and indexes it into a vector database.

🐞 Diagnose Bugs Instantly: Paste a cryptic error message, and the agent performs a deep search of the codebase to find relevant context, provide a root cause analysis, and generate a concrete code fix.

🤔 Ask High-Level Questions: Go beyond debugging and ask natural language questions like, "How does user authentication work?" or "Show me the database connection logic." The agent provides detailed explanations with supporting code snippets.

🚀 Take Action: After diagnosing a bug, the agent can, with a single click, create a fully-formatted issue in the GitHub repository, completing a true, end-to-end agentic workflow.

How we built it

This project is built on a modern, agentic AI stack, with TiDB Serverless at its core.

Vector Database: I used TiDB Cloud with its vector search capabilities as the long-term memory for our agent. All the code from a repository is chunked, vectorized using OpenAI's embeddings, and stored in a single, metadata-rich table in TiDB.

AI Orchestration: LangChain was used to structure our agentic chains. I built two primary chains: one for the multi-step bug diagnosis (retrieve -> analyze -> fix) and another for the "Ask Your Codebase" feature.

Frontend: The entire user interface is built with Streamlit. I chose it for its ability to rapidly create interactive, professional-looking web apps. I added a custom theme, a sidebar layout, and interactive context viewers to create a polished user experience.

External Tools: The agent connects to the GitHub API (via the PyGithub library) to perform its final action of creating an issue.

Challenges we ran into

Building a robust agent in a short time frame came with several challenges:

API Token Limits: During ingestion, I initially tried to embed an entire repository at once, which crashed due to OpenAI's token limits. I solved this by implementing a batch processing system that feeds documents to the API in smaller, manageable chunks.

Git Branch Ambiguity: The initial code was hardcoded to look for the main branch, which failed on older repositories that use master. I overcame this by building a more resilient try...except logic that attempts to check out main first, and if that fails, automatically tries master.

UI State Management: Streamlit reruns its script on every interaction. I had to use st.session_state carefully to store the results of the AI's analysis so that they could be accessed later when the user decided to create a GitHub issue.

Accomplishments that we're proud of

Beyond a Simple RAG: I'm incredibly proud of the "Ask Your Codebase" feature. It elevates the project from a simple debugger into a true codebase comprehension tool, which feels like a significant leap in capability.

A True End-to-End Agent: The agent doesn't just provide information; it takes action. The full workflow—from a user pasting an error to a new issue appearing on GitHub—demonstrates a complete, impactful agentic loop.

A Polished User Experience: I invested time in the UI, adding a custom dark theme, a clean sidebar layout, and interactive expanders that let the user "see the agent's brain" by viewing the retrieved context. This makes the powerful backend a joy to use.

What we learned

Throughout this hackathon, I learned a great deal about the practicalities of building AI agents. I gained a deep appreciation for the power of vector databases like TiDB to serve as the foundation for powerful RAG applications. I also learned how crucial a well-designed prompt chain is and how small changes can drastically alter the quality of the AI's output. Finally, I learned that a great backend is only as good as the user interface that presents it, and a little effort on UX goes a long way.

What's next for Codebase Autopsy Agent

This project has a ton of potential, and I'm excited about where it could go.

Private Repository Support: Integrate GitHub OAuth to allow users to connect and analyze their private codebases securely.

IDE Integration: Build a VS Code extension that brings the agent's diagnostic and Q&A capabilities directly into the developer's editor.

Proactive Bug Hunting: Train the agent to proactively scan code for "code smells," potential bugs, or deprecated library usage and flag them before they cause problems.

Built With

embeddings
githubapi
gitpython
langchain
openai
pygithub
python
streamlit
tidb

Updates

Sai Donepudi started this project — Sep 14, 2025 11:35 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.