About the Project

We built WatchDogAI, an Edge AI-powered Data Loss Prevention (DLP) system that safeguards enterprise and personal data in real time — without limiting productivity or access to modern tools.


What It Does

Our system continuously monitors domains, clipboard content, keystrokes, and outgoing messages to detect sensitive information such as API keys, access tokens, emails, credentials, or other confidential data.
Whenever a risk is detected, WatchDogAI automatically masks or replaces the content with placeholders — preventing accidental data leaks at the source.

Unlike traditional DLP tools that block websites or restrict access, WatchDogAI empowers employees to use modern online tools safely — without needing to rely on expensive enterprise-only AI models or outdated in-house systems.

AI based dashboards for IT desks in enterprises to control and check alerts. - integrated with Amplitude AI

System Diagram


How It Works

Clipboard (Copy-Paste) Monitoring

When a user copies content — text, code, or an imageWatchDogAI runs multiple Edge AI models in consensus to identify potential sensitive information.

Key Steps:

  • Detection and Consensus

    • Multiple lightweight LLMs and edge models analyze the copied data.
    • Each model outputs a confidence score, and only when they agree is action taken.
  • Data Processing

    • Sensitive content is replaced with placeholders locally.
    • A secure mapping is stored on the device, linking placeholders to original values.
  • Pasting Logic

    • Before pasting, WatchDogAI checks if the target domain or application is safe.
    • If safe → original content is restored.
    • If unsafe → masked content is pasted instead.

All of this happens in milliseconds, entirely locally, ensuring no data leaves the device.

Alt text


Real-Time Keyword / Keystroke Monitoring

In addition to clipboard monitoring, WatchDogAI tracks keystrokes and keyword sequences in real time to catch sensitive information before it’s copied or sent.

How It Works:

  1. Keyword Detection

    • The system continuously scans typed input for sensitive keywords or patterns (emails, API keys, PII, etc.).
  2. Debounce Mechanism

    • Detection is temporarily delayed (debounced) to capture context and avoid false positives.
    • Only confirmed sensitive sequences trigger masking or alerts.
  3. Target & Domain Check

    • If a user types sensitive info into a non-approved domain or application, the system either blocks it or automatically replaces it with a placeholder.
    • Approved domains are allowed, preserving productivity.

Effect: This ensures that sensitive data is never typed or sent insecurely, even if the user doesn’t copy-paste it.


Detection & Consensus

Models Used:

Model Type Notes
obi/deid_roberta_i2b2 (RoBERTa) Transformer Specialized for medical PII detection; used in consensus/ensemble mode; based on i2b2 medical dataset
Qwen / Qwen2-0.5B-Instruct Causal LLM Optional; uses prompt engineering for PII extraction; not actively used in main pipeline
betterdataai/PII_DETECTION_MODEL Qwen-based Optional; experimental
akshyakh93/deberta_finetuned_pii (DeBERTa) Transformer Fine-tuned PII detection; used in ensemble with RoBERTa
spaCy NER Models CPU-based NER en_core_web_sm (fast, less accurate), en_core_web_lg (more accurate, slightly slower)
Regex Pattern Matching Rule-based Detects emails, phone numbers, and other structured PII

Consensus Mechanism:

  • Each model assigns a confidence score to potential sensitive content.
  • Only when a majority of models agree is content flagged for masking.
  • Reduces false positives and improves reliability across different data types.

What if the system hallucinates?

  • Even if a model incorrectly identifies something as sensitive, the consensus mechanism ensures other models must agree before action.
  • The local mapping & restore logic prevents accidental deletion of safe content, maintaining both security and usability.

Image and Screenshot Protection

The same logic applies to images and screenshots.
Before any upload or paste, WatchDogAI’s vision model analyzes the image locally to detect confidential text or embedded credentials — blurring or masking them before they can leak.


File Upload Protection

WatchDogAI also protects files before they are uploaded.
When a user attempts to upload a document, PDF, or dataset:

  • The Edge AI engine scans the file locally for sensitive information such as API keys, PII, or internal data.
  • A secure version of the file is automatically created — sensitive portions are replaced with placeholders or masked.
  • This secure version is uploaded, while the original file remains safely stored on the user’s device.

This ensures that even large file uploads remain privacy-safe, with no raw data leaving the local system.


Smart Restore: Context-Aware Recovery

When a user copies results from an LLM or AI tool that include placeholders (e.g., [API_KEY]),
WatchDogAI automatically maps these placeholders back to their original values — but only if the paste target is an approved or secure location.

This ensures users can continue working productively while maintaining airtight control over where sensitive data reappears.


Challenges We Ran Into

  • Integrating browser extensions with native system agents securely and efficiently.
  • Managing real-time message passing between components without performance loss.
  • Running lightweight AI inference directly on the edge with minimal resource usage.
  • Designing a universal detection framework that works across text, images, and files seamlessly.

Accomplishments That We're Proud Of

  • Built a fully functional Edge-based DLP system that runs without cloud dependency.
  • Developed a consensus-based AI detection mechanism for higher accuracy and reduced false positives.
  • Implemented local placeholder mapping, enabling seamless safe copy-paste and file handling.
  • Designed a privacy-first architecture that users can trust — no data leaves the device.

What's Next for WatchDogAI

We envision WatchDogAI becoming the new standard for Data Loss Prevention systems.
Our goals include:

  • Integrating directly with operating systems as a built-in security layer.
  • Expanding into enterprise dashboards for IT teams to visualize and manage protections.
  • Supporting real-time monitoring across more file types and collaboration tools.

Ultimately, WatchDogAI aims to evolve into a seamless, OS-integrated DLP solution that redefines how privacy and productivity coexist.


Why It Matters

Modern workforces need freedom — to experiment, use online AI tools, and collaborate globally.
But freedom shouldn’t mean risk.

WatchDogAI bridges that gap by providing privacy-first data protection right where it matters most:
on your device, in real time.


Built With

  • Python – Backend and AI logic
  • Flask – Local communication and APIs
  • TensorFlow Lite / OpenCV – Edge AI and vision inference
  • WebExtensions API – Browser monitoring and message interception
  • SQLite3 – Local mapping and placeholder storage
  • JavaScript – Frontend interaction and extension logic
    ```

Built With

Share this project:

Updates