Inspiration
As LLMs are increasingly deployed in real products, we noticed a gap between model-level safety and production-level control. While modern models include internal safeguards, teams lack visibility into prompt behavior, drift, and risk over time. We wanted to build a system that treats AI safety as infrastructure, not an afterthought.
What it does
PromptGuard is a model-agnostic safety and observability layer for LLMs. It intercepts prompts and responses in real time, evaluates safety and hallucination risk, visualizes trends in a dashboard, and automatically alerts or blocks high-risk outputs before they reach users.
How we built it
We built PromptGuard using Next.js and TypeScript for the frontend and API layer, Supabase for persistent storage and audit logs, and Ollama for running local LLMs. We used the Gemma model for text generation and a separate embedding model for risk analysis groundwork. The system combines rule-based blocking with scoring heuristics to make fast, explainable safety decisions.
Challenges we ran into
One major challenge was relying solely on LLM-based safety scoring, which proved unreliable for consistent blocking. We solved this by introducing deterministic guard rules for illegal or high-risk intent. Another challenge was visualizing risk trends meaningfully, which required filtering and scaling data correctly to avoid misleading flat graphs.
Accomplishments that we're proud of
We successfully built an end-to-end safety layer that can detect, alert, and block unsafe prompts in real time. The project includes a polished dashboard, real-time risk trends, and clear guard statuses, making it usable as a real internal AI safety tool rather than just a demo.
What we learned
We learned that AI safety in production requires more than just trusting model-level safeguards. Combining deterministic rules, scoring heuristics, and human-visible dashboards creates more reliable and auditable systems. We also learned the importance of clear UX when communicating risk.
What's next for PromptGuard
Next, we plan to add multi-model comparisons, configurable safety thresholds, prompt drift detection, and alert resolution workflows. Long-term, PromptGuard could integrate with external notification systems and support multiple projects and teams from a single control plane.
Built With
- nextjs
- ollama
- supabase
Log in or sign up for Devpost to join the conversation.