Inspiration
Security teams are drowning in log data. Existing tools either surface everything (information overload) or nothing (missed threats). I wanted to build a RAG system that doesn’t just retrieve logs — it enforces who can see what, blocks adversarial inputs, and leaves a full audit trail for compliance teams.
What it does
Secure RAG Auditor is a security intelligence system built on three layers:
Pre-Query Defense
A 5-category prompt injection detector using 19 regex patterns blocks adversarial queries before they reach the database or LLM.Secure Retrieval
ChromaDB’s$ltemetadata filter enforces attribute-based access control directly at the database layer, not in application code — meaning clearance checks cannot be bypassed.Automated Governance
Every query is recorded in a SQLite audit ledger, including:- who made the request
- clearance level
- detected risk
- whether the request was blocked
The LLM layer (GPT-4o-mini) analyzes retrieved logs and generates a structured Security Summary Report containing:
risk_levelkey_findingsrecommendation
How I built it
- FastAPI for async API routing with modular architecture
- ChromaDB as the embedded vector database with persistent storage
- Pydantic for type-safe request and response validation
- OpenAI GPT-4o-mini with a strict analyst system prompt and token budget management
- SQLite audit ledger using Python’s built-in
sqlite3
Challenges
The biggest challenge was enforcing access control at the correct layer. Filtering after retrieval in application code is insecure because bugs could expose restricted logs. By moving the $lte clearance filter directly into the ChromaDB query, the database never returns documents above the user’s clearance level.
Another challenge was building prompt injection detection that doesn’t over-block legitimate requests. Overly aggressive filtering creates false positives. The solution was categorizing attacks into 5 injection types and tuning thresholds independently for each category.
What I learned
- Data-layer security is more reliable than application-layer filtering
- Structured schemas (via Pydantic) make LLM outputs production-ready
- Observability and audit trails are what separate prototypes from systems that compliance teams can actually trust
Log in or sign up for Devpost to join the conversation.