Inspiration

Modern sharing workflows (short-form videos, livestream clips, classroom recordings, screenshots of chats/dashboards, captions, prompts sent to LLMs) routinely leak:

  • Bystander faces and creator identity linkages
  • Emails, phone numbers, IDs, addresses, credentials
  • Internal project names, partial tokens, API keys (in screenshots / prompts)
  • Street & shop signage, license plates (location inference)
  • Screen overlays (notifications, chat bubbles) captured unintentionally

Manual blurring is slow, inconsistent, and error-prone; cloud redaction services introduce a second exposure surface (logging, retention, model retraining).

What it does

Sentinel brings a multilayer privacy firewall directly onto your device.

  1. Comprehensive Multi-Modal Privacy Removal with Flexible Modes Our privacy protection model provides end-to-end privacy removal for images, text, and audio, ensuring sensitive information in all types of data is effectively handled.

  2. Fully On-Device Operation We deploy models directly on the device, so all privacy protection happens locally.
    There’s no need to upload data to a server, eliminating risks from untrusted servers and ensuring that your privacy always stays under your control.

  3. Advanced Semantic Privacy Removal Traditional privacy removal techniques mainly focus on text, while image-based privacy removal often relies on OCR to extract text or on object detection (e.g., license plates, faces) for blurring.

In contrast, we leverage advanced large multi-modal language models (MLLMs) with strong general-purpose privacy detection capabilities.
Even when faced with undefined privacy categories, the model can automatically identify and remove sensitive content through high-level semantic understanding.

Additionally, it offers flexible modes to suit different needs:

  • Fast Mode: Quick removal, cleaning sensitive information in images within one second—ideal for rapid processing.
  • Deep Mode: Thorough detection, examining every detail to ensure no private information is missed—perfect for high-security scenarios.

How we built it

Ettin-encoder finetuned for textual PII detection. OCR + YOLOv12 + Qwen2.5VL for image grounding processing. With 3 Americanos, till 3am.

Challenges we ran into

We initially struggled with shortages in computing power during model training, but solved this by leveraging the NUS school computing cluster, which enabled us to finish fine-tuning effectively.

Accomplishments that we're proud of

We’re proud that we built a fully working multimodal privacy firewall that runs completely on-device, protecting both text and images comments in TikTok-style posts. Through quantization and fallback paths, we optimized Sentinel to run smoothly even on lower-end GPUs and CPUs. We also created a user experience that feels seamless—allowing one-click auto-masking or selective anonymization for finer control.

What we learned

OCR + regex rules or face/license plate detectors only cover well-defined categories; real-world content leaks are much more subtle. Furthermore, even the most advanced models won’t matter if users find the workflow clunky. One-click anonymization and selective redaction modes were crucial.

What's next for Sentinel

Integrate with TIKTOKKKKKK!!!!!!!

Built With

  • mllm
  • python
  • small-lms
  • tkinter
  • yolo
+ 29 more
Share this project:

Updates