Code Jury: PR Auditor

Agent Architect
Agent Security
title card
glass box UI
Agent Manager
Legend & Footer

Inspiration

Open-source and enterprise teams are shipping faster than ever, but pull request review quality is still inconsistent. A lot depends on who is online, who has security context, and who can explain technical changes to non-technical stakeholders.

We were inspired by the growing push for responsible AI in software governance, especially conversations around transparent and practical AI workflows in community ecosystems like Drupal. Instead of building another coding assistant, we wanted to build an AI system that behaves like a real review committee: architecture, security, and product-impact perspectives working together.

That is how Code Jury: PR Auditor was born.

What it does

Code Jury: PR Auditor takes a public GitHub pull request URL and runs a multi-agent review pipeline.

It produces:

Architect assessment: structural integrity grade and standards-oriented recommendations
Security assessment: risk status and vulnerability flags with evidence
Manager summary: plain-English business impact, risk level, and release readiness
Glass Box thought stream: transparent, readable logs of the agent workflow The result is faster, more consistent, and more explainable PR review governance.

How we built it

We built a full-stack web app with:

Backend: FastAPI + PyGithub + LangGraph + Gemini Flash model family
Frontend: Next.js + TypeScript + Tailwind CSS

Flow:

User submits a GitHub PR URL.
Backend fetches PR metadata and diff using GitHub API integration.
LangGraph orchestrates three specialized agents in sequence: Agent A: Architect Agent B: Security Agent C: Manager
Backend returns structured outputs plus runtime logs.
Frontend renders Mission Control UI, visual result cards, legend, and thought stream.

We also added model failover and graceful fallback behavior for reliability under quota/rate-limit conditions.

Challenges we ran into

Model availability and quota behavior varied by model alias.
Early output could fail hard under provider rate limits.
We needed to avoid black-box AI behavior and improve trust.
UI had to stay readable on mobile while showing dense technical output.

Accomplishments that we're proud of

Built a true multi-agent PR auditing flow, not a single prompt wrapper.
Added transparent Glass Box logs to make AI behavior understandable.
Designed clear role-specific outputs for developers and non-technical stakeholders.
Implemented reliability improvements (model failover + fallback) for demo stability.
Delivered a polished Mission Control interface with responsive UX and interpretation legend.
Kept the system lightweight and hackathon-clean with no external database.

What we learned

Agent specialization improves consistency and clarity over monolithic prompting.
Trust is a product feature: transparency and explainability matter as much as raw model quality.
Real-world AI systems need operational safeguards (fallbacks, retries, clear error states).
Product framing for judges/stakeholders is critical: governance value is the differentiator.

What's next for Code Jury: PR Auditor

Add support for GitLab, Bitbucket, and Azure DevOps PRs
Add organization policy packs (security-sensitive, fintech, healthcare, OSS maintainer mode)
Add persistent run history and quality trend analytics
Add webhook-triggered automatic audits on PR events
Add integration with static analyzers and dependency scanners
Add collaboration surfaces (Slack/Teams notifications, inline PR comments)
Evolve toward enterprise-grade compliance reporting and audit export

Code Jury: PR Auditor is our first step toward a practical AI governance layer for modern software delivery.