Inspiration
As LLM-powered applications move rapidly from prototypes to production, teams face a critical gap: LLMs fail silently. Hallucinations, sudden latency spikes, and uncontrolled token usage often go unnoticed until users are impacted. Existing observability tools excel at infrastructure monitoring but lack visibility into AI behavior itself. SentinelAI was inspired by the need to bring production-grade reliability, transparency, and accountability to applications built with Gemini, enabling teams to trust and scale their LLM systems with confidence.
What it does
SentinelAI provides real-time observability for Gemini-powered LLM applications by monitoring both system-level metrics and LLM-specific signals. It captures telemetry such as response latency, token consumption, confidence scores, and hallucination indicators, and streams this data to Datadog. Intelligent detection rules continuously evaluate application health and AI quality, automatically generating alerts and incidents enriched with contextual information so AI engineers can quickly diagnose and resolve issues before end users are affected.
How we built it
The platform consists of a lightweight Node.js backend that acts as an orchestration layer between the frontend and Gemini / Vertex AI. Each LLM request is instrumented to collect latency, token usage, prompt metadata, and confidence signals. This telemetry is streamed in real time to Datadog using custom metrics and logs, where dashboards and detection rules are defined. A React-based frontend provides a simple interface to interact with the LLM and visualize application health, while automated alerting ensures anomalies are immediately actionable.
Challenges we ran into
One of the main challenges was defining meaningful AI-quality signals, such as hallucination likelihood and confidence scoring, in a way that is both measurable and actionable. Balancing the level of telemetry collected without introducing performance overhead was another challenge. Additionally, mapping LLM-specific behaviors to traditional observability concepts like incidents and alerts required careful design to ensure the output remained useful for engineers rather than overwhelming them with noise.
Accomplishments that we're proud of
Built an end-to-end observability pipeline tailored specifically for LLM applications
Successfully integrated Gemini with Datadog for real-time AI telemetry
Designed actionable detection rules that generate context-rich incidents
Created clear dashboards that surface AI quality, performance, and cost signals
Demonstrated how LLM failures can be detected before impacting users
What we learned
We learned that observability for LLMs requires more than traditional metrics—it demands a deep understanding of AI behavior, uncertainty, and risk. Treating prompts and responses as first-class observability data unlocks powerful insights into reliability and cost control. We also learned the importance of designing alerts that are actionable, ensuring engineers can respond quickly rather than chasing ambiguous signals.
What's next for SentinelAI- Platform for LLM Applications
Next, we plan to expand SentinelAI with advanced hallucination detection models, role-based alerting, and automated remediation workflows. Future versions will include multi-model support, cost optimization recommendations, and security-focused features such as prompt injection detection and compliance auditing. Ultimately, SentinelAI aims to become a standard reliability layer for teams deploying LLM applications at scale.
Built With
- apis
- built-with-what-languages
- cloud-services
- css3
- databases
- datadog
- express.js
- frameworks
- gcp
- gemini
- html5
- javascript
- node.js
- platforms
- react
- vertexai
Log in or sign up for Devpost to join the conversation.