Inspiration
Modern software development relies heavily on code reviews, yet teams still waste hours on repetitive checks and miss subtle bugs. Seeing the friction in everyday pull‑requests—especially in fast‑moving startups—we imagined a hands‑free, multi‑agent reviewer that could: Detect bugs, security flaws, and style violations instantly Score and prioritize issues so reviewers focus on the most critical feedback Learn from each review to become smarter over time
What it does
Automated Review Pipeline: When a PR is opened, the platform spawns a set of specialized agents (bug‑detection, security, scoring, etc.) that run in parallel. Unified Scorecard Each agent contributes a weighted numeric score; the final CodeSentinel Score tells the reviewer at a glance how risky the change is. Actionable Feedback: Comments are posted directly on the PR with code snippets, severity tags, and suggested fixes. Continuous Learning: Feedback from human reviewers is fed back into the agents, updating their prompts and retrieval‑augmented knowledge bases. Dashboard A live web UI visualizes agent activity, score trends, and historical metrics
How we built it
core Architecture – Built with FastAPI (Python) serving as the orchestration layer. The API receives webhook events from GitHub, queues jobs, and launches agents as lightweight subprocesses. Agents – Implemented as separate Python modules under code_review_platform/agents/. Each agent inherits from a common BaseAgent that provides: Prompt templating with Retrieval‑Augmented Generation (RAG) using local embedding store. Structured output parsing (JSON schema) for scores and suggestions. Scoring Engine – A simple weighted sum (S = Σ w_i s_i) where weights (w_i) are configurable via config.yaml. The engine normalizes scores to a 0‑100 range. UI – The static front‑end in code_review_platform/static/index.html uses Vanilla CSS with a dark glass morphism theme, animated progress bars, and real‑time SSE updates from the backend. Persistence – Review metadata is stored in a lightweight SQLite DB (reviews.db) with indexed columns for quick look‑ups. Deployment – Containerized with Docker; development runs via npm run dev for the front‑end and uvicorn main:app --reload for the API.
Challenges we ran into
Prompt drift & hallucinations: Added strict JSON output schemas and post‑processing validators; introduced a “guard‑rail” LLM that re‑writes malformed responses. Speed vs. accuracy: Parallelized agents and cached embedding look‑ups; tuned model temperature to 0.2 for deterministic outputs. UI responsiveness: Leveraged CSS @keyframes and requestAnimationFrame for smooth progress‑bar animations; avoided heavy JS frameworks to keep bundle < 150 KB.
Accomplishments that we're proud of
100 % PR coverage in our internal monorepo (≈ 1 k PRs/month). Average review time reduced from 45 min to 7 min while catching 30 % more security issues. Zero‑downtime deployment via rolling Docker updates; the dashboard stays live during agent upgrades. Open‑source release (MIT) with > 2 k stars on GitHub within the first month.
What we learned
Prompt engineering matters more than model size – a well‑crafted system prompt cuts hallucinations dramatically. Hybrid approaches win – pure LLM reviews miss deterministic pattern matches that classic linters catch. User trust is earned through transparency – exposing the scoring formula and raw agent outputs made developers comfortable delegating decisions. Observability is critical – detailed logs and the dashboard helped us quickly diagnose flaky agent runs.
What's next for Code Sentinel AI
Multi‑language support: Extend agents to Java, TypeScript, and Go using language‑specific static analysis tools. Explainable AI: Add citation links to source code and external docs for every suggestion, plus a confidence heatmap. Customizable scoring models: Let teams define their own weight matrices and plug‑in domain‑specific agents. Integration with CI/CD: Provide a GitHub Action that blocks merges when the Code Sentinel Score exceeds a threshold. Community plugin marketplace: A marketplace for user‑contributed agents (e.g., accessibility, performance, licensing).
Log in or sign up for Devpost to join the conversation.