About the Project
We built WatchDogAI, an Edge AI-powered Data Loss Prevention (DLP) system that safeguards enterprise and personal data in real time — without limiting productivity or access to modern tools.
What It Does
Our system continuously monitors domains, clipboard content, keystrokes, and outgoing messages to detect sensitive information such as API keys, access tokens, emails, credentials, or other confidential data.
Whenever a risk is detected, WatchDogAI automatically masks or replaces the content with placeholders — preventing accidental data leaks at the source.
Unlike traditional DLP tools that block websites or restrict access, WatchDogAI empowers employees to use modern online tools safely — without needing to rely on expensive enterprise-only AI models or outdated in-house systems.
AI based dashboards for IT desks in enterprises to control and check alerts. - integrated with Amplitude AI

How It Works
Clipboard (Copy-Paste) Monitoring
When a user copies content — text, code, or an image — WatchDogAI runs multiple Edge AI models in consensus to identify potential sensitive information.
Key Steps:
Detection and Consensus
- Multiple lightweight LLMs and edge models analyze the copied data.
- Each model outputs a confidence score, and only when they agree is action taken.
- Multiple lightweight LLMs and edge models analyze the copied data.
Data Processing
- Sensitive content is replaced with placeholders locally.
- A secure mapping is stored on the device, linking placeholders to original values.
- Sensitive content is replaced with placeholders locally.
Pasting Logic
- Before pasting, WatchDogAI checks if the target domain or application is safe.
- If safe → original content is restored.
- If unsafe → masked content is pasted instead.
- Before pasting, WatchDogAI checks if the target domain or application is safe.
All of this happens in milliseconds, entirely locally, ensuring no data leaves the device.

Real-Time Keyword / Keystroke Monitoring
In addition to clipboard monitoring, WatchDogAI tracks keystrokes and keyword sequences in real time to catch sensitive information before it’s copied or sent.
How It Works:
Keyword Detection
- The system continuously scans typed input for sensitive keywords or patterns (emails, API keys, PII, etc.).
- The system continuously scans typed input for sensitive keywords or patterns (emails, API keys, PII, etc.).
Debounce Mechanism
- Detection is temporarily delayed (debounced) to capture context and avoid false positives.
- Only confirmed sensitive sequences trigger masking or alerts.
- Detection is temporarily delayed (debounced) to capture context and avoid false positives.
Target & Domain Check
- If a user types sensitive info into a non-approved domain or application, the system either blocks it or automatically replaces it with a placeholder.
- Approved domains are allowed, preserving productivity.
- If a user types sensitive info into a non-approved domain or application, the system either blocks it or automatically replaces it with a placeholder.
Effect: This ensures that sensitive data is never typed or sent insecurely, even if the user doesn’t copy-paste it.
Detection & Consensus
Models Used:
| Model | Type | Notes |
|---|---|---|
| obi/deid_roberta_i2b2 (RoBERTa) | Transformer | Specialized for medical PII detection; used in consensus/ensemble mode; based on i2b2 medical dataset |
| Qwen / Qwen2-0.5B-Instruct | Causal LLM | Optional; uses prompt engineering for PII extraction; not actively used in main pipeline |
| betterdataai/PII_DETECTION_MODEL | Qwen-based | Optional; experimental |
| akshyakh93/deberta_finetuned_pii (DeBERTa) | Transformer | Fine-tuned PII detection; used in ensemble with RoBERTa |
| spaCy NER Models | CPU-based NER | en_core_web_sm (fast, less accurate), en_core_web_lg (more accurate, slightly slower) |
| Regex Pattern Matching | Rule-based | Detects emails, phone numbers, and other structured PII |
Consensus Mechanism:
- Each model assigns a confidence score to potential sensitive content.
- Only when a majority of models agree is content flagged for masking.
- Reduces false positives and improves reliability across different data types.
What if the system hallucinates?
- Even if a model incorrectly identifies something as sensitive, the consensus mechanism ensures other models must agree before action.
- The local mapping & restore logic prevents accidental deletion of safe content, maintaining both security and usability.
Image and Screenshot Protection
The same logic applies to images and screenshots.
Before any upload or paste, WatchDogAI’s vision model analyzes the image locally to detect confidential text or embedded credentials — blurring or masking them before they can leak.
File Upload Protection
WatchDogAI also protects files before they are uploaded.
When a user attempts to upload a document, PDF, or dataset:
- The Edge AI engine scans the file locally for sensitive information such as API keys, PII, or internal data.
- A secure version of the file is automatically created — sensitive portions are replaced with placeholders or masked.
- This secure version is uploaded, while the original file remains safely stored on the user’s device.
This ensures that even large file uploads remain privacy-safe, with no raw data leaving the local system.
Smart Restore: Context-Aware Recovery
When a user copies results from an LLM or AI tool that include placeholders (e.g., [API_KEY]),
WatchDogAI automatically maps these placeholders back to their original values — but only if the paste target is an approved or secure location.
This ensures users can continue working productively while maintaining airtight control over where sensitive data reappears.
Challenges We Ran Into
- Integrating browser extensions with native system agents securely and efficiently.
- Managing real-time message passing between components without performance loss.
- Running lightweight AI inference directly on the edge with minimal resource usage.
- Designing a universal detection framework that works across text, images, and files seamlessly.
Accomplishments That We're Proud Of
- Built a fully functional Edge-based DLP system that runs without cloud dependency.
- Developed a consensus-based AI detection mechanism for higher accuracy and reduced false positives.
- Implemented local placeholder mapping, enabling seamless safe copy-paste and file handling.
- Designed a privacy-first architecture that users can trust — no data leaves the device.
What's Next for WatchDogAI
We envision WatchDogAI becoming the new standard for Data Loss Prevention systems.
Our goals include:
- Integrating directly with operating systems as a built-in security layer.
- Expanding into enterprise dashboards for IT teams to visualize and manage protections.
- Supporting real-time monitoring across more file types and collaboration tools.
Ultimately, WatchDogAI aims to evolve into a seamless, OS-integrated DLP solution that redefines how privacy and productivity coexist.
Why It Matters
Modern workforces need freedom — to experiment, use online AI tools, and collaborate globally.
But freedom shouldn’t mean risk.
WatchDogAI bridges that gap by providing privacy-first data protection right where it matters most:
on your device, in real time.
Built With
- Python – Backend and AI logic
- Flask – Local communication and APIs
- TensorFlow Lite / OpenCV – Edge AI and vision inference
- WebExtensions API – Browser monitoring and message interception
- SQLite3 – Local mapping and placeholder storage
- JavaScript – Frontend interaction and extension logic
```
Built With
- edgeai
- html
- javascript
- json
- large
- llm
- python

Log in or sign up for Devpost to join the conversation.