Transparency Lens

Transparency Lens Dashboard

Inspiration

We kept hearing "if the product is free, you are the product" but nobody could ever show us what that actually looks like. How many trackers? Which companies? What do they actually collect? What is your data worth in dollars?

We wanted to make it visible in real time, not as a privacy report you read later and ignore. The idea was simple: intercept every tracker as it happens, explain it in plain English, draw it on a map, and put a dollar figure on it. When we figured out we could do all of that with mitmproxy's CONNECT interception plus Gemma 4 plus a live animated map, we had our project.

What it does

The Transparency Lens sits between a device and the internet. Connect to the hotspot, browse normally, and every tracker gets intercepted and displayed on the dashboard in real time before the page even finishes loading.

Each tracker is classified into a category (Advertising, Analytics, Fingerprinting, Social), explained in plain English by Gemma 4, and plotted as a curved arc on a world map from Union, NJ to the tracker's server. A Privacy Health Score starts at 100 and drops with every new exposure:

$$S_n = 100 - \sum_{i=1}^{n} w_{c_i}$$

where each category carries a penalty weight:

$$w = \begin{cases} 8 & \text{Fingerprinting} \ 4 & \text{Advertising} \ 3 & \text{Social} \ 2 & \text{Analytics} \end{cases}$$

A Data Value Estimator converts that exposure into a live dollar figure using ad industry CPM rates. Every 5 minutes, Gemma 4 reads the full tracker list and generates a 2-sentence behavioral profile of what a data broker could infer about you. Cross-session data goes to Snowflake, where Cortex AI generates insights in pure SQL.

How we built it

Network layer. A Raspberry Pi 4 runs as a wireless access point. All traffic routes through Ethernet to a MacBook running mitmproxy on port 8080. Our narrator.py addon hooks into http_connect, which fires before any encryption starts. The CONNECT request exposes the target hostname in plaintext. That's all we need. No certificate installation on the browsing device.

Classification and AI. narrator.py matches the hostname against 60+ regex patterns to assign a category, resolves the server's location via ip-api.com, and sends the hostname to Gemma 4 for a plain-English summary. Then it POSTs the full event to the backend.

Data layer. Express ingests the event, writes it to MongoDB Atlas for the live session, and asynchronously writes to Snowflake for cross-session analytics. Socket.io broadcasts to every connected dashboard immediately.

Frontend. React 19, Vite, Tailwind, Framer Motion. Leaflet.js with CartoDB Dark tiles for the map. Recharts for the donut chart. Every component updates live as events arrive.

Challenges we ran into

HTTPS interception without installing a cert. Our original plan was full MITM decryption with mitmproxy certificates on the browsing device. That breaks immediately in a demo because phones reject unknown CAs and nobody installs a certificate from a stranger. The fix was realizing we didn't need to decrypt anything. The hostname in the CONNECT request is everything we need. We switched narrator.py to hook http_connect only and the cert problem disappeared.

Gemma 4 cold start latency. The first Gemini API call of a session takes 3 to 6 seconds. A page loading 15 trackers at once made the feed feel frozen. We fixed it by posting the event to the backend immediately with a placeholder summary and resolving the Gemma 4 call asynchronously. The card shows up instantly and the explanation fills in seconds later.

Snowflake connection pooling. Snowflake's Node.js driver doesn't pool connections. Under burst load, 15 simultaneous tracker events were each spawning a new connection and the warehouse was timing out on auth. We serialized Snowflake writes through one persistent connection with reconnect logic, isolated from the MongoDB and Socket.io paths so warehouse latency never touched the live feed.

Bezier arc rendering at scale. Drawing 40+ SVG arcs at once caused frame drops on lower-end machines. We staggered new arc renders by 150ms and capped the highlighted state (full opacity and glow) to the 5 most recent connections, fading older arcs to 20%. The map stayed smooth for the whole demo.

Accomplishments that we're proud of

It works with real devices browsing real websites. Loading one news homepage produced 23 tracker events in 8 seconds. Watching arcs draw to servers in Virginia, Frankfurt, Singapore, and Sao Paulo while the Privacy Score dropped from 100 to 54 is the most effective privacy demo we've ever seen.

A judge connected their personal phone, browsed Instagram, and had a Gemma 4 profile describing them in under 30 seconds. No app. No certificate. No setup. That was the moment that made it feel real.

We're also proud of how Gemma 4 works at the card level, not just as a periodic summary. Every single tracker gets its own explanation. We made over 200 individual Gemma 4 calls in a 30-minute session and every one came back different, accurate, and readable.

What we learned

Network-level interception is fundamentally different from a browser extension. A proxy catches every app on the device simultaneously at the TCP/TLS handshake layer, before the browser's own logic runs. One proxy covers the browser, the news app, the weather widget, all of it.

Snowflake Cortex running inference directly inside a SQL query against aggregated data, with no external API call, is cleaner than we expected. It changes how you think about where AI belongs in a data pipeline.

The most effective privacy education is specificity. "This site uses cookies" does nothing. Showing someone that a connection just went to doubleclick.net in Ashburn VA and their data was auctioned in 47 milliseconds, with a map arc and a dollar figure, actually lands.

What's next for Transparency Lens

Browser extension port. A Chrome and Firefox extension using the webRequest API would bring the live feed and map to users who can't run a proxy. Most of the educational value stays intact.

Longitudinal profiles. Snowflake already stores every session. The next step is showing week-over-week tracker exposure, which companies have seen you the most across sessions, and how your inferred interest profile has shifted over time.

Opt-out automation. Right now it only shows. The next version would act: drafting GDPR subject access requests, generating opt-out emails to the top 10 detected data brokers, or configuring a DNS sinkhole on the Pi to block future connections to detected trackers.

Classroom kit. The whole setup runs on a $35 Raspberry Pi 4. We want to package it as a school deployment: one Pi, a 10-minute setup guide, and a curriculum module on digital privacy for high school students. Privacy literacy needs actual tools, not just slides.