Polaris - AI-powered PR review for DevOps and IaC

Inspiration

We have all been on teams where infrastructure code gets carefully reviewed for correctness but almost never for security. A developer pushes a Terraform file with an S3 bucket set to public read, a Dockerfile with hardcoded secrets, or a Kubernetes manifest running privileged containers. It looks fine in the diff, passes review, and ships to production. Nobody catches it until an audit or a breach.

The frustrating part is that these are not obscure bugs. They are well-documented patterns that any senior SRE would catch in 30 seconds. The problem is not knowledge. It is bandwidth. No team has a dedicated person reviewing every infrastructure PR for security, so it just does not get done.

We wanted to build the reviewer that never gets busy, never misses a PR, and always knows the CIS benchmark off the top of its head.

What it does

Polaris is an autonomous security agent that installs as a GitHub App and automatically scans every Pull Request containing infrastructure code. It covers Terraform HCL, Kubernetes YAML, Dockerfiles, and GitHub Actions workflows.

The moment a PR is opened, a pipeline fires automatically:

Parse extracts modified IaC files from the PR diff
Scan runs deterministic rules against known security anti-patterns
Reason uses Gemini to map every finding to CIS and SOC2 frameworks and generate a minimal-diff code fix
Verify runs a second Gemini agent to confirm each patch does not break existing functionality before presenting it
Report posts inline PR comments on the exact vulnerable line, with fixes ready to commit

From the dashboard, a developer can approve a fix with one click. Polaris commits it directly to the PR branch with a full audit trail. No copy-paste, no context switching.

How we built it

The frontend is built with Next.js 15 and Tailwind CSS, using NextAuth.js with GitHub OAuth for authentication. Each user only sees scans for their own repositories, enforced at the session level.

The backend is a Python FastAPI server that receives GitHub webhooks and orchestrates the scan pipeline. We use Gemini 2.5 Flash for both the reasoning and verification agents, with the model configurable via environment variable.

PR Opened -> GitHub Webhook -> FastAPI Backend -> Deterministic Scan
         -> Gemini (Reasoning Agent) -> Gemini (Verification Agent)
         -> Inline PR Comments + Commit Status -> Dashboard Updated

For local development we used Smee.io to forward GitHub webhooks to localhost. The database is SQLite locally with the schema designed to be PostgreSQL-ready for production.

Challenges we faced

The 4-second wall

When we first wired up the full pipeline including both Gemini agents, end-to-end latency was above 20 seconds. We profiled every stage and found the bottleneck: the reasoning agent was waiting for the full deterministic scan to finish before starting. We restructured the flow so findings are streamed to Gemini incrementally as the scanner produces them. That alone cut latency by more than half.

Gemini over-fixing everything

Our first version of the reasoning agent was too eager. Ask it to fix an open security group and it would rewrite the entire Terraform module with opinionated changes nobody asked for. We fixed this with a strict prompt constraint: generate the smallest possible diff that fully remediates the finding, preserve all variable names, preserve all surrounding logic, and change nothing that is not directly responsible for the vulnerability.

GitHub webhook timing races

GitHub fires the webhook the moment a PR is opened, but the diff is not always available via the API at that exact instant. Our parser was hitting the diff endpoint milliseconds too early and getting empty results, causing the pipeline to complete silently with zero findings. We added an exponential backoff retry with a cap of 5 attempts. The harder problem was writing a reliable test that reproduced the race condition so we could verify the fix held.

Inline comment positioning on GitHub PR diffs

GitHub's review API does not accept arbitrary file line numbers. It requires a hunk position relative to the diff output itself, which resets at every hunk boundary. We had to write a diff position resolver that parses the raw unified diff, tracks hunk offsets, and maps every file line back to its hunk-relative position. If this is off by one line, GitHub drops the comment silently with a 422 and no explanation. This single issue took a full day to debug.

What we learned

The dual-agent architecture was one of the best decisions we made. Having a separate Gemini agent verify every proposed fix before it reaches the developer caught cases where the reasoning agent produced a valid patch that would have introduced a regression. The AI was working well within the first day. What took the most time was GitHub's webhook lifecycle, the diff positioning API, OAuth session scoping, and making sure each user only ever sees their own data. The AI was the easy part.

What's next for Polaris

Production deployment on Vercel and Railway with PostgreSQL
Organization-wide dashboard showing security posture across all repos
Custom YAML-based policy rules so teams can define their own scan patterns
Slack and Teams notifications for critical severity findings
Drift detection that alerts when live infrastructure diverges from the last clean PR scan
Self-healing mode where Polaris proactively opens PRs to fix dormant vulnerabilities in existing codebases