Inspiration

I continue to receive data breach notifications and in fact, while building this for the competition I was notified my data was found on the dark web (again) by Capital One CreditWise and four days ago an unsecured database was found online containing 149 Million Usernames and Passwords which included millions of Gmail, Facebook, banking logins, and more.

I started building RegData 3 weeks ago.

After 30 years working in regulatory compliance and government technology, I'd watched the same pattern repeat: Equifax. Target. Yahoo. Billions in damages. Companies thought encryption was enough. It wasn't. The fundamental problem? We're trying to prevent every breach. But attackers only need to win once. Defenders need to win every time.

I realized we needed a different approach: accept that breaches happen, but make stolen data worthless.

I read the article and thought: This is exactly why RegData exists.

If those companies had been using tokenization, that breach would have been a non-event. Attackers would have dumped the database and gotten worthless tokens. No credit card numbers. No real emails. Nothing they could sell on the dark web.

This validated why RegData is necessary.


What it does

RegData is a two-sided privacy platform that fundamentally separates data utility from data storage. Gemini 3 Flash is the intelligence layer that transforms RegData from a manual tokenization tool into an autonomous compliance platform. By leveraging Structured Outputs and JSON Schema enforcement, we bypass fragile parsing logic and feed complex analysis directly into our React UI. This enables three critical automations—schema security auditing, compliance reporting, and real-time data valuation—that would otherwise require teams of security engineers. Without Gemini, RegData is a crypto library. With Gemini, RegData is an intelligent enterprise platform.

1. For Companies (The Vault):

It acts as a tokenization layer that sits between an application (like our demo store, ShopCo) and its database. Instead of storing raw Credit Card numbers or Emails, the app stores useless tokens (e.g., tkn_cc_8x92m...). The actual sensitive data is encrypted using AES-GCM and stored in the isolated RegData Vault. We built a Security Dashboard powered by Google Gemini that:

  • Analyzes database schemas to automatically identify PII.
  • Monitors for breach patterns in real-time.
  • Generates instant audit reports for GDPR, CCPA, and HIPAA compliance.

2. For Users (The Wallet):

Users get a "Data Wallet" where they can see exactly which companies hold their tokens. Crucially, it gives them a "Kill Switch." If a user clicks "Revoke Access" in their wallet, RegData destroys the decryption key in the Vault. Even if the company has the token, it resolves to NULL. The data effectively evaporates, enforcing the "Right to Erasure" instantly across the ecosystem.


How we built it

Built Entirely in Google AI Studio: We built RegData from concept to production in a few days using Google AI Studio's code generation capabilities. As a founder with 30 years of regulatory compliance expertise but limited recent coding experience, AI Studio allowed me to translate domain knowledge directly into a 2,180-line TypeScript application. This demonstrates AI Studio's power to enable domain experts to build production-grade software without traditional engineering teams. The entire platform—React components, AES-GCM encryption engine, Gemini API integration, and breach simulation—was architected and implemented through natural language collaboration with AI Studio. This isn't just a project that uses Gemini; it's a project built BY Gemini. We built the entire platform using React and TypeScript, simulating a complex microservices architecture within the browser. The Crypto Engine: We utilized the Web Crypto API to implement robust AES-GCM encryption. We built a VaultService that handles key derivation, encryption (returning ciphertext + IV), and decryption. This isn't just a database lookup; we are actually encrypting data in memory.

Google Gemini Integration: We leveraged the Gemini 1.5/3 Flash models via the @google/genai SDK for the intelligent layers:

  • Schema Analysis: We feed raw JSON/SQL schemas to Gemini, which identifies security risks and recommends fields for tokenization.
  • Forensics: The "Breach Simulation" logs are parsed by Gemini to explain why the data was safe (e.g., "Attack vectors neutralized by tokenization").
  • Data Valuation: We use Gemini to analyze user profiles and estimate their monthly monetary value to data brokers, giving users a tangible reason to care about privacy.

The Simulation: We built a "ShopCo" e-commerce demo that interacts with the MockDatabase and VaultService in real-time. When you buy a product in ShopCo, it seamlessly calls the Vault to tokenize your info before "saving" it.


Challenges we ran into

Visualizing the Invisible: Security is hard to demo because when it works, nothing happens. We had to build a specific "Breach Simulation" mode in the ShopCo app that executes a mock SQL injection. It dumps the database to the screen so judges can see that the "hacker" only gets useless tokens, while the Admin dashboard lights up with alerts.

The Revocation Race Condition: Implementing the "Kill Switch" was tricky. We had to ensure that when a user revokes a key in the User Dashboard, the "Breach Dump" in the ShopCo app immediately reflects that data as "🔴 DATA NULLIFIED" rather than just encrypted. It required tight synchronization between our mock database and the encryption vault state.

Gemini JSON Strictness: Getting the AI to return consistent, strictly typed JSON for the Security Analysis scores took several iterations of prompt engineering to ensure the dashboard charts wouldn't crash on malformed data.


Accomplishments that we're proud of

The "Smart Breach" Demo: Watching the SQL dump change colors in real-time—seeing valid tokens turn into "DATA NULLIFIED" the moment a user revokes access in their wallet—is a powerful visual of data sovereignty.

Seamless AI Integration: The Compliance Reports don't feel like "AI chat"; they feel like native features. The integration is so smooth that the AI acts as a backend logic layer rather than just a chatbot.

Real Encryption: We didn't just pretend to hide the data; we actually implemented the AES-GCM logic. The concepts here are production-ready logic scaled down for the hackathon.


What we learned

Privacy is a UX problem: Users don't care about encryption keys; they care about value and control. By showing them the "Dollar Value" of their data via Gemini, privacy becomes an asset class they want to protect.

Tokenization solves the "Right to Erasure": GDPR compliance is usually a nightmare of finding backups and rows. With tokenization, you delete one key, and the data is gone everywhere, forever. It turns a legal compliance problem into a simple key management problem.


What's Next for RegData

Privacy Protection Infrastructure

Automated Data Broker Intelligence:

To power RegData's privacy protection capabilities, we built an autonomous scraping pipeline using Gemini AI and Playwright that analyzed entire opt-out guide libraries.

Results:

  • 289 data broker removal workflows fully documented
  • 847 individual steps across all guides
  • 32 CAPTCHA-protected opt-out processes identified
  • 80 email confirmation workflows catalogued
  • Difficulty & speed ratings for each broker (1-5 scale)

Technical Implementation:

  • Scraper: Playwright + Cheerio (Node.js/TypeScript)
  • Rate Limiting: 2 seconds between requests
  • Total Runtime: 25 minutes
  • Success Rate: 100% (all 289 guides successfully extracted)

How It Works:

  1. User provides basic info (name, location, email)
  2. Gemini scans 289 broker sites to detect actual exposure
  3. AI agents autonomously navigate opt-out pages
  4. Gemini Vision analyzes forms and fills them correctly
  5. AI solves CAPTCHAs (32 brokers require them)
  6. Handles email confirmations (80 brokers require them)
  7. Verifies successful removal

Impact:

This infrastructure enables RegData to offer privacy protection at scale automating removals.

Enterprise Scaling

Current Demo: In-browser tokenization
Production Needs: Enterprise-grade performance

Architecture Requirements:

  • Distributed worker pool processing 100,000 records/second
  • Multi-region vault with automatic failover
  • Zero-downtime migration for existing databases

Real-World Timeline (10M customer database):

  • Schema analysis with Gemini: 30 seconds
  • Hot data (active users): 1-2 minutes
  • Legacy data: Lazy tokenization over weeks
  • Total downtime: 0 minutes

Migration Strategy:

  1. Shadow mode - tokenize new records only
  2. Backfill active customers (10% of database)
  3. Lazy migration - tokenize on first access
  4. Final sweep of dormant accounts

Companies can migrate production databases without interrupting service.

Built With

  • aes-gcm
  • google-gemini-api
  • lucidreact
  • react19
  • recharts
  • tailwind
  • typescript
Share this project:

Updates