Inspiration
nspiration AI is being deployed everywhere — customer service, healthcare, education, finance — but almost no one is checking if the outputs are actually safe. We kept seeing news about AI systems generating toxic responses, leaking private data, and being jailbroken with simple tricks. That frustrated us. We wanted to build something that sits between an AI and its users and acts as a real safety net — not just a content filter, but a full multi-layer validation system. Trident was born from that frustration.
What it does Trident is a 12-layer AI safety scanner that analyzes any AI-generated prompt or response in real time. It runs every input through sentiment analysis, toxicity detection, jailbreak recognition, PII scanning, bias checking, misinformation detection, hallucination flags, injection attack detection, and a machine learning classifier trained on 24,000 real-world unsafe prompts — all at once. It gives an instant CLEARED or BLOCKED verdict with per-layer scores, and optionally triggers a Gemini deep analysis for flagged content.
How we built it We built Trident entirely in Python using Streamlit for the UI. The ML layer uses scikit-learn with a TF-IDF vectorizer and RandomForest classifier, Logistic regression trained on Nvidia's Aegis dataset — 74,000 labeled prompts balanced between safe and unsafe. Each of the 12 layers is an independent function with its own logic, running in sequence with a live progress bar. The deep analysis is powered by Google Gemini 2.5 Flash via API. We also built a live chat auditor module where Gemini monitors a response in a conversation format and generates a final structured audit report.
Challenges we ran into The biggest challenge was making the ML classifier actually catch short hostile phrases like "go and die" or "kys" — TF-IDF struggles with 2 and 3 character tokens that don't appear in training data. We solved it with a layered approach: regex phrase matching in L01 and L02 catches what the ML model misses, so no single layer is a single point of failure. We also had a Gemini API key get auto-revoked by Google because it was accidentally hardcoded and pushed to a public repo — that was a stressful midnight fix. Managing secrets properly under deadline pressure was a real lesson.
Accomplishments that we're proud of We're proud that Trident actually works as a real safety tool, not just a demo. The 12-layer architecture with graceful fallbacks means it runs even with zero API keys and zero external dependencies — pure heuristics kick in automatically. We're also proud of the UI — it looks and feels like a professional security product, not a hackathon project. Training a model on a real 24k-row industry dataset and hitting 78% accuracy with a balanced F1 score in a hackathon timeframe felt like a genuine achievement.
What we learned We learned that AI safety is genuinely hard — edge cases are everywhere and no single technique catches everything. Layered defence is the only reliable approach. We also learned a lot about the Nvidia Aegis dataset and how real-world unsafe prompts are distributed across categories like jailbreaks, hate speech, PII leaks, and controlled substance queries. On the engineering side, we got much better at managing Streamlit state, caching heavy ML models properly with @st.cache_resource, and keeping a Streamlit app fast even with a 24k-row training pipeline running on startup.
What's next for Trident We want to make Trident a proper API — so any developer can send a POST request with an AI response and get back a structured safety report in JSON. We also want to add real-time monitoring for live AI deployments, a webhook system that fires alerts when content is blocked, and sector-specific fine-tuned models for healthcare, legal, and finance. Longer term, we want to expand the ML model to 500k+ samples and add multilingual support — because unsafe AI outputs aren't just an English problem. Also we are planning to add the HUA algorithm (Horizon Unrolling algorithm). This is trident signing off !!!
Built With
- ai
- api
- machine-learning
- python
- vercel
Log in or sign up for Devpost to join the conversation.