Sentinel

face & text & element blurring
GIF
demo

Inspiration

Modern sharing workflows (short-form videos, livestream clips, classroom recordings, screenshots of chats/dashboards, captions, prompts sent to LLMs) routinely leak:

Bystander faces and creator identity linkages
Emails, phone numbers, IDs, addresses, credentials
Internal project names, partial tokens, API keys (in screenshots / prompts)
Street & shop signage, license plates (location inference)
Screen overlays (notifications, chat bubbles) captured unintentionally

Manual blurring is slow, inconsistent, and error-prone; cloud redaction services introduce a second exposure surface (logging, retention, model retraining).

What it does

Sentinel brings a multilayer privacy firewall directly onto your device.

Comprehensive Multi-Modal Privacy Removal with Flexible Modes Our privacy protection model provides end-to-end privacy removal for images, text, and audio, ensuring sensitive information in all types of data is effectively handled.
Fully On-Device Operation We deploy models directly on the device, so all privacy protection happens locally.
There’s no need to upload data to a server, eliminating risks from untrusted servers and ensuring that your privacy always stays under your control.
Advanced Semantic Privacy Removal Traditional privacy removal techniques mainly focus on text, while image-based privacy removal often relies on OCR to extract text or on object detection (e.g., license plates, faces) for blurring.

In contrast, we leverage advanced large multi-modal language models (MLLMs) with strong general-purpose privacy detection capabilities.
Even when faced with undefined privacy categories, the model can automatically identify and remove sensitive content through high-level semantic understanding.

Additionally, it offers flexible modes to suit different needs:

Fast Mode: Quick removal, cleaning sensitive information in images within one second—ideal for rapid processing.
Deep Mode: Thorough detection, examining every detail to ensure no private information is missed—perfect for high-security scenarios.

How we built it

Ettin-encoder finetuned for textual PII detection. OCR + YOLOv12 + Qwen2.5VL for image grounding processing. With 3 Americanos, till 3am.

Challenges we ran into

We initially struggled with shortages in computing power during model training, but solved this by leveraging the NUS school computing cluster, which enabled us to finish fine-tuning effectively.

Accomplishments that we're proud of

We’re proud that we built a fully working multimodal privacy firewall that runs completely on-device, protecting both text and images comments in TikTok-style posts. Through quantization and fallback paths, we optimized Sentinel to run smoothly even on lower-end GPUs and CPUs. We also created a user experience that feels seamless—allowing one-click auto-masking or selective anonymization for finer control.

What we learned

OCR + regex rules or face/license plate detectors only cover well-defined categories; real-world content leaks are much more subtle. Furthermore, even the most advanced models won’t matter if users find the workflow clunky. One-click anonymization and selective redaction modes were crucial.

What's next for Sentinel

Integrate with TIKTOKKKKKK!!!!!!!

Built With

mllm
python
small-lms
tkinter
yolo

Submitted to

TikTok TechJam 2025

Created by

I finetuned 4 models for text pii detection and evaluated them on some real life text scenarios by hand crafting a test set. I also developed the YOLO pipeline and helped to integrate to other teammates' codebase.

Ervinoreo Yeoh
I explored general-purpose image privacy removal, aiming not to limit the task to a few fixed categories but to build a system that can process any image and automatically detect and remove potential privacy-related elements. To achieve such zero-shot capability, I deployed a lightweight multimodal large language model (MLLM) on-device, specifically a fine-tuned Qwen2.5-VL with 3B parameters. This model provides both element localization and OCR functionality, making it well-suited for privacy detection tasks.

To evaluate its performance, I constructed a small test dataset that included common privacy-sensitive elements such as student cards, house numbers, phone screens, and license plates. The model performed reliably—even when phone screens were blurred, it could still identify and accurately localize them as private information.

Finally, I integrated this model with my teammate’s YOLO-based system to enhance face detection and processing. This combination became our final image privacy protection solution, effectively balancing broad zero-shot capability with targeted improvements on critical categories like faces.

tianhe chen
I built the initial face detection system, then switched to YOLO after discovering its superior performance through testing with my teammate. I developed multi-mode blurring (Gaussian, Box, Median) with customizable parameters, but realized traditional blurring looked unnatural and artificial.

To solve the authenticity problem, I implemented synthetic face replacement using a modified InSwapper model with faces from ThisPersonDoesNotExist.com. This created natural-looking anonymization that maintains image authenticity instead of obvious blurring.

I contributed to UI design, used school computer clusters for model training and evaluation, and help integrated all anonymization functions into a unified system.

Xiaoxiao Ma
I started by picking up Lynx and tried to create a ui for our app. However we later decided that we dont need an UI so I went to help with the backend. I fine tuned a baseline model, however the results are quite bad. Then I helped test a baseline model on a given test cases and analyzed its results compared to our model. Then I created documentations for Github and edited videos during the final integration phase.

Siliang Sun