AI Audit for RLS in Merge Requests: Policy Sentinel

Inspiration

Between January 2025 and February 2026, documented AI app breaches kept repeating the same preventable mistakes. Missing Supabase Row Level Security showed up again and again as one of the most damaging ones. On January 31, 2026, Moltbook exposed roughly 1.5 million API tokens because RLS was never enabled. CVE-2025-48757 showed the same failure pattern in Lovable-generated apps. In one 2026 Supabase scanning dataset, 83% of exposed databases involved RLS misconfigurations.

That made the opportunity obvious. This is not a vague “security is important” problem. It is a narrow, recurring, high-blast-radius failure mode that keeps shipping. We built Policy Sentinel because this kind of mistake should be caught in merge requests, before it ever reaches production.

What it does

Policy Sentinel is a GitLab Duo Agent Platform audit flow for Row Level Security changes in merge requests. When it is assigned as a reviewer or mentioned on an MR, it inspects SQL migrations, Supabase policy changes, schema updates, and Postgres functions that affect authorization.

It looks for high-risk patterns like overly permissive policies such as using (true), dropped or weakened RLS protections, policies whose SQL no longer matches the developer’s stated intent, and subtler backdoors like SECURITY DEFINER functions that bypass RLS while trusting caller-supplied user IDs. It then posts a structured GitLab merge request review that explains the risk, shows the exploit path, and recommends a safer fix.

This is intentionally not a generic “AI security reviewer.” It is a focused reviewer for one of the highest-signal security failure classes in modern AI apps.

How we built it

We built Policy Sentinel as a GitLab-native review workflow on the GitLab Duo Agent Platform. The key design choice was to make it trigger from normal GitLab collaboration, not from a separate security dashboard. The merge request is where risky policy changes are proposed, discussed, and approved, so that is where the audit belongs.

The flow starts from MR context and narrows its scope to the files that matter most for RLS: migrations, policy SQL, schema definitions, and database functions. From there it reasons about the semantic effect of a change, not just keywords. That matters because the dangerous cases are often small. A single permissive clause can expose every row in a table, while a seemingly clean helper function can silently bypass every table policy underneath it.

We also built the project around realistic examples. One demo branch shows the classic using (true) mistake. Another shows a much subtler SECURITY DEFINER function with a caller-controlled parameter that turns into an IDOR-style data leak even though the table policies look correct. A third branch is a secure refactor, included specifically to prove that the reviewer can tell the difference between “RLS changed” and “RLS got weaker.”

Challenges we ran into

The hardest part was precision. RLS bugs are small in code but huge in impact, and developers will only trust a reviewer like this if it avoids noisy false positives. It is easy to build a bot that panics every time it sees drop policy. It is much harder to build one that understands when a policy was actually strengthened.

Another challenge was that not every RLS bug lives in a create policy statement. Functions, RPCs, and SECURITY DEFINER behavior matter too, especially in Supabase projects that outgrow basic CRUD. We had to design the analysis to reason about authorization flow, not just table policies in isolation.

We also had to translate deep Postgres and Supabase security details into merge request feedback that a developer can act on quickly. A good finding is not just “this is insecure.” It explains what changed, why it matters, how it could be abused, and what the safer pattern looks like.

Accomplishments that we're proud of

We are proud that Policy Sentinel is aimed at a real breach pattern, not a generic hackathon-safe security theme. It focuses on a small slice of code that often matters more than the rest of the feature combined.

We are also proud of the range of issues it can distinguish. It catches obvious mistakes like public-read-everything policies, but it also surfaces deeper authorization flaws hiding behind otherwise professional-looking SQL. Just as importantly, it can recognize a safe refactor and avoid punishing a team for improving their policy model.

Most of all, we are proud that the project keeps the review inside GitLab, inside the merge request, and inside the developer workflow. The goal is not to create another dashboard to ignore. The goal is to stop a breach-class mistake at the exact point where it can still be fixed cheaply.

What we learned

We learned that the most valuable AI security workflows are often the narrowest ones. Broad “review my code for security issues” experiences are crowded, vague, and hard to trust. But a reviewer specialized in one breach-prone class of mistakes can be surprisingly effective.

We also learned that app-layer behavior hides a lot of database security failures. A UI can appear to work correctly because the frontend filters to the current user, while the underlying database is still wide open to anyone who calls it directly. That gap between visible behavior and real authorization is exactly where automated review can help.

Finally, we learned that GitLab Duo flows are strongest when the trigger, context, and action are tightly scoped. In this case the best workflow was simple: trigger on merge request review, analyze only the authorization-relevant changes, and post a concrete security verdict where the team is already making decisions.

What's next for Technical Debt by Impact

Policy Sentinel is our first proof point for a broader idea: the most important technical debt is not the code that looks messy, it is the code that creates the biggest blast radius when it is wrong. A few lines of misconfigured security policy can expose millions of records. That is technical debt by impact.

Next, we want to expand beyond Supabase and Postgres RLS into adjacent BaaS security rules, especially Firebase-style database and storage policy mistakes that have shown the same failure pattern in real breaches. We also want to add safer autofix support for straightforward cases, team-specific secure policy baselines, and stronger GitLab-native enforcement such as labels, approval recommendations, and remediation issue creation for high-confidence findings.

If the Devpost project name is now Policy Sentinel, I would rename the last heading to What's next for Policy Sentinel. If Technical Debt by Impact is your umbrella concept, the current version works.

Fact basis used for the incident framing: Moltbook / Wiz, CVE-2025-48757 / NVD, AI breach roundup, Tea / Reuters, Supabase RLS stat.

Built With

claude-on-gcp
duo
gcp
gitlab
gitlab-duo

Updates

Nick Spreen started this project — Mar 25, 2026 01:09 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.