Inspiration
As large language models move from experimentation into production systems, a major gap becomes clear: while we can monitor infrastructure and APIs, we lack reliable ways to observe, measure, and enforce trust in AI behavior. Issues like hallucinations, prompt injection attacks, and unexpected cost spikes often appear only after users are affected.
LLM Guardrail AI was inspired by the need to treat AI systems like first-class production services—with measurable health, clear failure signals, and actionable remediation—rather than black boxes.
What it does
LLM Guardrail AI is a production safety and observability layer for Gemini-powered applications. It continuously evaluates every LLM request, assigns a real-time LLM Trust Score, and streams structured telemetry to Datadog.
The system detects three high-impact risk categories:
- Hallucination Risk – abnormal output entropy and response instability
- Security Risk – prompt injection and jailbreak patterns
- Cost Risk – sudden token usage anomalies
When risk thresholds are exceeded, Datadog automatically raises incidents, correlates signals, and surfaces actionable context for engineers to respond.
How we built it
The application is composed of a frontend control panel and a backend trust enforcement service.
Frontend:
A React-based dashboard that visualizes the live LLM Trust Score, active guardrail violations, and recovery status using clear color-coded states.Backend:
A Gemini-powered service running on Google Cloud Vertex AI. Each request is evaluated by a trust and risk engine that computes hallucination, security, and cost signals before calculating the overall Trust Score.Observability:
Instead of using a Datadog Agent, the backend streams logs, metrics, and events directly to Datadog via APIs. Custom monitors detect guardrail violations and trigger incidents with root-cause summaries and remediation steps.
Authentication to Vertex AI is handled using Google Cloud Application Default Credentials, keeping secrets out of the frontend and aligning with production best practices.
Challenges we ran into
- Defining trust in a measurable way:
Translating probabilistic LLM behavior into a deterministic, explainable Trust Score required careful balancing of simplicity and usefulness. - Environment limitations:
Our runtime environment did not support the Datadog Agent, so we implemented direct API-based telemetry while preserving full observability. - Keeping the MVP focused:
We intentionally limited the system to three guardrails to avoid noise and ensure clear, reproducible signals. ## Accomplishments that we're proud of - Designed a clear, mathematical LLM Trust Score that updates in real time
- Built a fully agentless Datadog integration using logs, metrics, and incidents
- Created reproducible guardrail violations that demonstrate real production risks
- Delivered an end-to-end system that moves from detection to remediation ## What we learned
- Observability for AI systems requires new signals beyond latency and uptime
- Correlating multiple weak signals is more effective than relying on a single metric
- Actionable context is critical—alerts without guidance slow down response
- Production-ready AI systems need trust to be continuously monitored, not assumed ## What's next for LLM Guardrail AI Next, we plan to expand guardrails to include model drift and response quality trends, integrate automated remediation workflows, and support multiple LLM providers beyond Gemini. Long-term, LLM Guardrail AI could evolve into a standardized trust layer for deploying AI safely at scale.
Built With
- react
- supabase
- tailwind
Log in or sign up for Devpost to join the conversation.