Sherlock

Sherlock Login Page
Connect to Repo
Investigation Report
Code Diff
Sherlock Architecture

Inspiration

Knowing the frustrating feeling of fixing what seems like a tiny bug in your programming spiral into a several hours long debugging session, even with the assistance of AI tools like Claude. This is why we created Sherlock. Sherlock’s goal is to make every developer’s life easier by investigating and solving debugging anomalies to make troubleshooting more efficient and quicker!

What our product is & how we built it

Sherlock was built as a multi-agent AI system that combines large language models, cloud sandboxing, browser automation, and GitHub integration into a unified debugging platform.

The platform integrates GitHub for issue tracking and repository access where we clone the repo using docker as an isolated environment to safely execute and modify code. We also utilize Browserbase as our browser-based testing and automation tool where we reproduced user's bugs and verify those bugs are gone after Sherlock adds the fix. The agents powered by Claude configured through Band Ai serves as the primary reasoning engine, enabling agents to analyze repos, investigate issues, and generate code fixes.

To support reliability and scalability, we implemented an agent orchestration framework that manages communication between agents (via Band), tracks debugging progress, and coordinates verification cycles. The system also includes automated pull request generation, allowing validated fixes to be seamlessly returned to developers. By combining AI reasoning, browser automation, and cloud infrastructure, we created an end-to-end platform capable of autonomously resolving software bugs.

Challenges we ran into

Implementing a reliable feedback loop when bugs remained unresolved proved challenging.
Integrating and coordinating Browserbase, Claude, GitHub, and sandbox environments into a unified workflow was complex.
Ironically, building an AI debugging bot required hours of debugging due to the complexity of coordinating multiple AI services.
Defining a reliable feedback loop was difficult, especially deciding whether unsuccessful fixes should return to the investigative agent for additional context or the fixing agent for further revisions.

Accomplishments that we're proud of

Identifying a widespread challenge faced by developers and designing a comprehensive solution to address it
Learning how to connect all services such as Browserbase, Redis, Claude, Docker, & BandAI
Pulled an all nighter and chugged 10 redbulls across the team
Connected with amazing sponsors and peers
Fixing countless bugs for several hours

What we learned

Learned how to establish communication between two different service agents via an AI chatroom (Band AI)
Learning how to connect all services such as Browserbase, Redis, Claude, Docker, & BandAI