Inspiration
With the rise of data leaks and oversharing—whether from individuals on social media or employees sharing company and customer data with public LLMs—sensitive information is often exposed unintentionally. We set out to build a lightweight, privacy-first solution that protects data without relying on external servers. By running compact ML models directly in web browsers, we proved this is possible. Our work is combination of Main Track and Business Productivity Track as both enterprises and individuals can use it to improve their security.
What it does
PII 360 is an open-source Chrome extension that detects and masks multilingual Personally Identifiable Information (PII) in images, text, and PDFs. It uses ONNX-based machine learning models to run fully on the client’s device, ensuring accuracy, speed, and privacy.
How we built it
We implemented the system using Transformers.js, React, TailwindCSS, JavaScript, and TypeScript to integrate the required components. Model inference was accelerated using WebGPU, which is widely supported on commodity hardware. To optimize storage and runtime efficiency, we applied 8-bit post-training quantization, reducing model size and improving compression. The pipeline integrates two models: (i) the recently released Apple VLM and (ii) Piiranha, a fine-tuned variant of microsoft/mdeberta-v3-base for PII detection. The DeBERTa backbone supports a maximum context length of 256 tokens; therefore, input sequences longer than this limit are segmented into non-overlapping chunks of 256 tokens for processing.
Supported languages
English, Spanish, French, German, Italian, Dutch
Supported PII types
Account Number, Building Number, City, Credit Card Number, Date of Birth, Driver's License, Email, First Name, Last Name, ID Card, Password, Social Security Number, Street Address, Tax Number, Phone Number, Username, Zipcode.
Challenges we ran into
The main challenge we faced was deploying pre-trained models in a web browser while ensuring compatibility across multiple platforms. We tested our solution on Chrome across Linux, Windows, and macOS. On Linux, we encountered WebGL issues due to limited driver support, so we extended our Chrome extension to leverage WebAssembly for improved inference performance. Another challenge was selecting optimal model parameter types to minimize model size while preserving accuracy which was solved by multiple experiments.
Accomplishments that we're proud of
We created an open-source tool that anyone can use, audit, or contribute to. Our goal was to explore the efficiency of lightweight models and deliver true on-device security for users. Through our exploration, we discovered that Regex alone cannot reliably detect PII, whereas lightweight ML models can achieve higher accuracy. Additionally, we tested visual language models (VLMs) in the browser, enabling individuals to detect PII in images—a capability that was previously uncommon. With the rapid rise of LLMs introducing new security challenges, tools like PII 360 are becoming increasingly essential.
What we learned
- Lightweight ML models can run efficiently across different web browsers, enabling real-time PII detection without heavy computational resources.
- Browser and device configurations significantly affect performance; optimizing for memory and CPU usage is key to maintaining responsiveness.
- On-device inference ensures privacy, but requires careful model selection and compression to balance speed, accuracy, and compatibility.
What's next for PII 360
- Implement both heuristic and non-heuristic techniques for token optimization to simplify text before sending it to the LLM API, reducing the number of input tokens.
- Expanding support to more document types (e.g., Word, Excel, scanned docs).
- Improving detection accuracy by fine-tuning and adding more ML models.
- Adding customizable masking options and integration with collaboration tools.
Built With
- javascript
- react
- tailwindcss
- transformers.js
- typescript
Log in or sign up for Devpost to join the conversation.