💡 Inspiration

The idea for CodeArbiter was born out of a mix of personal perfectionism and professional necessity. As a developer with a self-diagnosed case of "Code OCD," I’ve always been obsessed with ensuring my code is perfectly formatted, properly structured, and architecturally sound.

But the real "Aha!" moment came from my experience working at a fast-paced startup. As our team grew, the PR queue became a nightmare. Our Team Lead was stuck in a "Manual Review Loop"—spending hours every day pointing out the same basic architectural flaws instead of focusing on high-level strategy.

I realized we didn't just need a linter; we needed a Digital Team Lead. I thought: "What if I could build an agent that reads our project standards and enforces them automatically before a human ever has to look at the code?" That’s how CodeArbiter was born.

🛠️ How We Built It

CodeArbiter was built as a full-stack agentic system:

  • The Brain: We leveraged Gemini 2.5 Flash for its exceptional reasoning speed and massive context window. This allowed us to feed the agent entire style guides alongside complex PR diffs without losing coherence.
  • The Engine: A FastAPI backend manages the agentic workflow, fetching real-time data from the GitHub API and processing diffs through our specialized prompt-engineering pipeline.
  • The Interface: A premium Next.js dashboard featuring a Neo-brutalist design. We focused heavily on the "Agentic Feel," implementing a holographic scanning animation to represent the AI's "thought process" as it audits the code.

📊 The Readiness Score

We developed a smart Readiness Algorithm to instantly quantify the quality of a Pull Request. Instead of a simple pass/fail, CodeArbiter scores every PR from 0 to 100%.

How it Works

Every PR starts at a perfect 100. Points are subtracted based on the severity of the issues found:

Category Impact Deduction
🛡️ Security Issues High -40 points
🏗️ Architectural Flaws Medium -25 points
🎨 Style & Readability Minor -10 points

The final score provides an immediate "at-a-glance" look at whether the code is production-ready.

🚧 Challenges We Faced

  • Diff Density: Large PRs contain "noise" that can confuse standard LLMs. We iterated on the agent's logic to ensure it skips irrelevant files (like lockfiles or assets) and focuses exclusively on high-impact logic changes.
  • Architectural Pivot: Initially, we used WebSockets for streaming but found them unstable in high-latency environments. We refactored the analysis pipeline to a robust HTTP system that simulates a "live" feel while providing much higher reliability.

🧠 What We Learned

This project was a masterclass in Agentic UX. We learned that for developers to trust an AI reviewer, the UI needs to be transparent. By showing the agent's "initialization" and "scanning" phases, we turned a black-box process into a collaborative experience.

Furthermore, Gemini 2.5 Flash proved its efficiency. Its ability to process thousands of tokens of documentation in seconds allows us to provide human-level feedback in the time it takes to grab a cup of coffee.

🚀 What's Next for CodeArbiter

We plan to implement "Auto-Fix" capabilities. Soon, CodeArbiter won't just flag issues—it will automatically open a suggested "Fix-it" branch for the developer. Our goal is to make the jump from Arbiter to Assistant.

Built With

  • fastapi
  • gemini-2.5-flash
  • github-api
  • next.js-14
  • nextjs
  • prisma
  • python
  • sqlite
  • tailwind-css
Share this project:

Updates