Inspiration
Open-source and enterprise teams are shipping faster than ever, but pull request review quality is still inconsistent. A lot depends on who is online, who has security context, and who can explain technical changes to non-technical stakeholders.
We were inspired by the growing push for responsible AI in software governance, especially conversations around transparent and practical AI workflows in community ecosystems like Drupal. Instead of building another coding assistant, we wanted to build an AI system that behaves like a real review committee: architecture, security, and product-impact perspectives working together.
That is how Code Jury: PR Auditor was born.
What it does
Code Jury: PR Auditor takes a public GitHub pull request URL and runs a multi-agent review pipeline.
It produces:
- Architect assessment: structural integrity grade and standards-oriented recommendations
- Security assessment: risk status and vulnerability flags with evidence
- Manager summary: plain-English business impact, risk level, and release readiness
- Glass Box thought stream: transparent, readable logs of the agent workflow The result is faster, more consistent, and more explainable PR review governance.
How we built it
We built a full-stack web app with:
- Backend: FastAPI + PyGithub + LangGraph + Gemini Flash model family
- Frontend: Next.js + TypeScript + Tailwind CSS
Flow:
- User submits a GitHub PR URL.
- Backend fetches PR metadata and diff using GitHub API integration.
- LangGraph orchestrates three specialized agents in sequence: Agent A: Architect Agent B: Security Agent C: Manager
- Backend returns structured outputs plus runtime logs.
- Frontend renders Mission Control UI, visual result cards, legend, and thought stream.
We also added model failover and graceful fallback behavior for reliability under quota/rate-limit conditions.
Challenges we ran into
- Model availability and quota behavior varied by model alias.
- Early output could fail hard under provider rate limits.
- We needed to avoid black-box AI behavior and improve trust.
- UI had to stay readable on mobile while showing dense technical output.
Accomplishments that we're proud of
- Built a true multi-agent PR auditing flow, not a single prompt wrapper.
- Added transparent Glass Box logs to make AI behavior understandable.
- Designed clear role-specific outputs for developers and non-technical stakeholders.
- Implemented reliability improvements (model failover + fallback) for demo stability.
- Delivered a polished Mission Control interface with responsive UX and interpretation legend.
- Kept the system lightweight and hackathon-clean with no external database.
What we learned
- Agent specialization improves consistency and clarity over monolithic prompting.
- Trust is a product feature: transparency and explainability matter as much as raw model quality.
- Real-world AI systems need operational safeguards (fallbacks, retries, clear error states).
- Product framing for judges/stakeholders is critical: governance value is the differentiator.
What's next for Code Jury: PR Auditor
- Add support for GitLab, Bitbucket, and Azure DevOps PRs
- Add organization policy packs (security-sensitive, fintech, healthcare, OSS maintainer mode)
- Add persistent run history and quality trend analytics
- Add webhook-triggered automatic audits on PR events
- Add integration with static analyzers and dependency scanners
- Add collaboration surfaces (Slack/Teams notifications, inline PR comments)
- Evolve toward enterprise-grade compliance reporting and audit export
Code Jury: PR Auditor is our first step toward a practical AI governance layer for modern software delivery.
Log in or sign up for Devpost to join the conversation.