VantaPrompt - AI Data Loss Prevention System

Inspiration

We noticed a growing risk of sensitive data leakage when teams use AI models for internal workflows. Traditional tools mask PII, but semantic meaning and context can still expose private information. VantaPrompt was inspired by the need for a robust, AI-aware data loss prevention system that not only masks sensitive data but also analyzes prompts for hidden risks before they are sent to external AI models.

What it does

VantaPrompt scans all text input in real time, identifying sensitive information like emails, phone numbers, financial IDs, or API keys. It masks sensitive content, logs incidents in a secure database, and uses a Layer 2 AI analysis to assess semantic risk. Based on severity, it either allows, rewrites, or blocks content, providing safe alternatives where necessary. This ensures AI workflows can run safely without exposing confidential data.

How we built it

We used a two-layer architecture:

Layer 1 – Pattern-Based Masking: Detects PII using regex and schema-driven rules. Logs all incidents in MongoDB with metadata like severity, type, and context.

Layer 2 – Semantic Analysis: Uses Anthropic’s AI models to evaluate the risk of sanitized prompts, analyzing intent and context. Returns structured JSON with decisions (ALLOW, REWRITE, BLOCK) and safe alternatives when needed.

The backend is built in Node.js, storing logs in MongoDB, and integrates seamlessly with AI APIs. The system reads the local repo context to understand project-specific conventions, improving analysis accuracy.

Challenges we ran into

Balancing masking and usability: Masking too aggressively can make prompts unusable; too lenient can risk sensitive exposure.

Semantic analysis: Determining intent from masked prompts is tricky; false positives and false negatives had to be minimized.

Structured JSON output: Ensuring AI responses adhere strictly to the JSON schema required careful prompt engineering.

Integration: Combining real-time detection, logging, and AI analysis while maintaining performance and scalability.

Accomplishments that we're proud of

Built a fully automated AI data loss prevention pipeline that handles masking, logging, and semantic risk assessment.

Created a flexible system that can be extended to new PII types or AI models.

Achieved high accuracy in detecting risky prompts while maintaining usability.

Delivered a structured, actionable JSON output that backend systems can parse and act on automatically.

What we learned

Semantic AI analysis is crucial for truly safe AI workflows; pattern masking alone isn’t enough.

Clear schema-driven logging enables traceability, debugging, and audit compliance.

Prompt engineering is a skill that affects not just AI output but security and compliance.

Building safety systems requires thinking about both technical and human factors, like usability and workflow integration.

What's next for VantaPrompt

Enhance context-awareness: Integrate more project-specific knowledge to improve semantic accuracy.

Real-time monitoring dashboards: Visualize alerts, risks, and actions taken.

Adaptive learning: Use feedback from blocked or rewritten prompts to improve detection.

Broader AI integration: Support multiple AI models and enterprise platforms.

Policy enforcement: Allow configurable rules for organizations to enforce data safety policies automatically.

Built With

Updates

Ariiqman Naufal started this project — Dec 06, 2025 04:57 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.