Privacy Code Review

Inspiration

Privacy violations have cost businesses over €6.2 billion in GDPR fines since 2018. Meta paid €1.2 billion. Amazon paid €746 million. Yet 83% of data breaches still involve PII leaking through logs and analytics, issues that lived in source code long before anyone noticed. The pattern is always the same: privacy reviews happen too late. Existing security tools like Snyk and SonarQube catch SQLi and XSS but ignore privacy-specific risks, tracking before consent, PII in logs, third-party data sharing without agreements, LLM prompt leakage, or code that directly contradicts a company's own privacy policy. Manual privacy reviews don't scale, happen too late in the cycle, and developers lack the regulatory context to know whether their code violates GDPR Article 32 or CCPA §1798.100. When Google released Gemini 3, we saw the opportunity: a model with enough reasoning capability and context window to understand how personal data flows across an entire codebase, detect developer intent beyond regex, compare code behavior against plain-text privacy policies, and map findings to specific legal articles across jurisdictions. We could build what the industry has never had, an automated senior privacy engineer that works at the speed of a linter.

What it does

Privacy Code Review is a unified ecosystem of two complementary products that catch privacy violations in code before they reach production. Built with Gemini 3 and powered by Gemini 3.

Privacy Code Review (Web Dashboard): A Next.js application built using Google AI Studio with Gemini 3 that acts as an automated senior privacy engineer. Upload source files or a .zip archive, and Gemini 3 Pro performs deep, context-aware analysis across your entire codebase. It provides two developer-native views: a PR Review Mode with a file tree sidebar, Monaco Editor with inline code annotations, and before/after unified diff patches for every finding; and a Diagnostics Explorer with filterable, sortable findings by severity, category, and regulation. Every finding includes data flow source-to-sink visualization, "Why Risky" explanations, confidence scores, and regulatory mapping to GDPR, CCPA, India's DPDPA, and the ePrivacy Directive. Export audit-ready PDF or JSON reports. Zero data retention — code is processed in-memory and immediately discarded.

PrivacySDK (CI/CD Scanner): A comprehensive privacy vulnerability scanner that integrates directly into GitHub Actions, GitLab CI/CD, or any DevOps pipeline in minutes. It uses a hybrid architecture: 10 hardcoded rule engines with 50+ PII detection patterns provide deterministic, always-on detection, while Gemini 3 Pro Preview via Vertex AI adds intelligent context-aware analysis on top. It scans 12+ programming languages, posts real-time merge request comments, automatically creates issues for high-severity violations, and blocks pipelines when critical privacy risks are detected. The web interface supports drag-and-drop scanning, an interactive Ask Gemini chat for follow-up questions on any violation, cross-file data flow insights, and a privacy policy contradiction checker that compares your policy text against what your code actually does. Together they form a complete Detect → Map → Fix pipeline: Privacy Code Review provides the deep analysis dashboard, while PrivacySDK automates enforcement on every commit.

How we built it

We used Google AI Studio with Gemini 3 Pro to build the app. We leverage Gemini 3's massive context window by aggregating multiple source files into a single prompt, enabling cross-file data flow analysis, tracing how a variable defined in user.ts gets logged in logger.ts or sent to a third-party API in analytics.js. This is fundamentally impossible with regex-based tools. We use the native responseSchema capability of the @google/genai SDK to force strict, type-safe JSON output. Every finding includes machine-readable fields for severity, lineStart, lineEnd, regulatoryMapping, and confidence, enabling precise code annotations and the filterable diagnostics grid. Gemini 3 is primed with a Senior Privacy Engineer persona, instructed to map abstract code patterns to concrete legal articles (GDPR Art. 32 vs CCPA §1798.100 vs India DPDPA Sec. 4). It generates unified diff patches for every finding, automating the fix phase of the review. For PrivacySDK, Gemini 3 Pro Preview connects through Vertex AI for production-grade scanning in CI/CD pipelines, with intelligent 200-line chunking for large codebases. The Ask Gemini feature sends violation context back to the model for interactive follow-up explanations. The privacy policy checker combines policy text and codebase snapshots into a single prompt, using Gemini's reasoning to find contradictions between what a company promises and what its code does. Hybrid Architecture: PrivacySDK combines 10 deterministic rule engines (PII detection, consent validation, encryption checks, data flow rules, GDPR/CCPA/HIPAA compliance) with Gemini 3's AI analysis. Rules provide speed and reliability; Gemini provides reasoning and context. If AI is unavailable, hardcoded rules still catch violations, the system never fails silently.

Features: PrivacySDK uses Gemini 3 Pro Preview to transform privacy compliance from reactive legal work into proactive, AI-powered code intelligence.

Unlike traditional static analyzers that rely on regex pattern matching, PrivacySDK leverages Gemini 3's advanced reasoning to perform cross-file data flow analysis, tracing how personal data moves through an entire codebase and flagging violations that no pattern matcher could detect. For example, Gemini can identify that a user object created in one file is silently forwarded to a third-party tracking API in another file without consent, violating GDPR Article 6.

Key Gemini 3 integrations: • Cross-file PII flow reasoning across 12+ programming languages • Context-aware violation detection that understands code semantics, not just syntax • Interactive remediation, developers chat with Gemini about specific violations to get tailored fixes • Natural language privacy policy verification, Gemini checks whether code behavior matches privacy policy promises

Gemini 3's long-context window and multimodal reasoning make these capabilities possible for the first time. The tool runs as a web app, GitHub Action, or CLI, and has been organically adopted across 55+ countries. With EU AI Act enforcement beginning August 2025, AI-powered privacy compliance is no longer optional, it's essential infrastructure.

Tech Stack: Next.js 14 (App Router), Google Gemini 3 Pro via @google/genai SDK, Vertex AI, Monaco Editor, Tailwind CSS, shadcn/ui, jsPDF, Node.js/TypeScript, Google Cloud Run, Docker.

Challenges we ran into

Cross-file reasoning at scale was the biggest technical challenge. Privacy violations rarely live in a single file, PII collected in a user service might flow through a processing pipeline and leak in a logging module three files away. Getting Gemini 3 to reliably trace these data flows required careful prompt engineering and file aggregation strategies to maximize the context window without hitting token limits.

Structured output reliability required significant effort. We needed Gemini 3 to return precise line numbers, severity levels, confidence scores, and regulatory mappings in strict JSON, not just natural language. Getting consistent, parseable output required extensive use of the responseSchema feature and robust fallback parsing for edge cases where the model returned slightly malformed responses. Balancing AI intelligence with deterministic reliability was a core architectural decision. We couldn't ship a tool that misses a hardcoded SSN because the AI had a bad inference. Building the hybrid architecture where hardcoded rules always run as a foundation, with AI enhancement layered on top, required careful deduplication logic so violations aren't double-reported.

Multi-language support across 12+ languages meant ensuring PII patterns, consent checks, and encryption rules worked correctly across JavaScript, TypeScript, Python, Java, Go, C#, PHP, Ruby, Swift, Kotlin, Rust, and Scala, each with different syntax for the same privacy anti-patterns.

Privacy-first architecture for a privacy tool was non-negotiable. Developers won't upload source code to a tool that stores it. Building the zero-retention, in-memory processing pipeline where code never hits disk and is discarded immediately after analysis was essential for trust, but added complexity to the session management and results caching.

Accomplishments that we're proud of

Built the entire Privacy Code Review web dashboard using Google AI Studio with Gemini 3, demonstrating Gemini 3 as both a development tool and a production AI engine in the same project. Achieved cross-file data flow analysis that traces PII from source to sink across multiple files, something no regex-based linter can do and a capability that only became feasible with Gemini 3's context window and reasoning. Created a complete Detect → Map → Fix pipeline with two complementary products: a deep analysis dashboard with IDE-like PR review experience (Monaco Editor, inline annotations, unified diff patches) and an automated CI/CD scanner that blocks bad code before it merges. Built privacy policy contradiction detection — paste your privacy policy and Gemini compares it against what your code actually does, catching gaps between promises and implementation. Mapped every finding to specific regulatory articles across GDPR, CCPA, HIPAA, PCI DSS, India's DPDPA, and the ePrivacy Directive with confidence scores, making findings immediately audit-ready. Designed a zero-retention, privacy-first architecture — a privacy scanner that practices what it preaches, with in-memory processing, no database, and code discarded after analysis. Supported 12+ programming languages with 50+ PII detection patterns and 10 hardcoded rule engines that provide reliable detection even when AI is unavailable.

What we learned

Gemini 3's structured output capabilities (responseSchema) are a game-changer for building production AI tools. The ability to force type-safe JSON output means you can build real software on top of LLM responses, not just chat interfaces but parseable, machine-readable data pipelines.

The hybrid AI + rules architecture is the right pattern for compliance tooling. Pure AI is too unpredictable for security-critical scanning; pure rules miss context. Combining both gives you the reliability enterprises need with the intelligence that makes the tool actually useful.

Cross-file reasoning is where LLMs unlock genuinely new capabilities in static analysis. Single-file pattern matching has existed for decades. Understanding how data flows across service boundaries and comparing code behavior against natural-language policies, that's something only a model like Gemini 3 can do.

Privacy engineering is a massively underserved space in developer tooling. Security tools are everywhere; privacy-specific tools that understand regulatory context barely exist. The $5B privacy management software market is growing at 23.5% CAGR, and almost none of it operates at the code level.

What's next for Privacy Code Review

IDE Extensions: VS Code, Antigravity, Cursor and JetBrains plugins that bring Privacy Code Review findings directly into the editor with real-time inline annotations as developers type.

GitHub App: A one-click GitHub App that automatically reviews every PR for privacy violations with Gemini-powered comments, eliminating manual CI/CD configuration.

Custom Policy Engine: Let organizations define their own privacy policies and data handling rules, with Gemini automatically enforcing them against the codebase.

Expanded Regulation Coverage: Adding support for Brazil's LGPD, China's PIPL, Japan's APPI, and the EU AI Act as privacy regulations continue expanding globally.

Enterprise Dashboard: Aggregate privacy risk scores across an entire organization's repositories, tracking compliance posture over time with trend analysis and team-level reporting.

LLM Pipeline Scanning: Specialized detection for AI/ML codebases, identifying when raw PII flows into model training data, prompt templates, or fine-tuning pipelines, a category of privacy risk that is exploding as every company ships AI features.

Built With

appstudio
gemini

Updates

Nabanita De started this project — Feb 09, 2026 07:02 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.