Inspiration

As LLM-powered applications move rapidly from prototypes to production, teams face a critical gap: LLMs fail silently. Hallucinations, sudden latency spikes, and uncontrolled token usage often go unnoticed until users are impacted. Existing observability tools excel at infrastructure monitoring but lack visibility into AI behavior itself. SentinelAI was inspired by the need to bring production-grade reliability, transparency, and accountability to applications built with Gemini, enabling teams to trust and scale their LLM systems with confidence.

What it does

SentinelAI provides real-time observability for Gemini-powered LLM applications by monitoring both system-level metrics and LLM-specific signals. It captures telemetry such as response latency, token consumption, confidence scores, and hallucination indicators, and streams this data to Datadog. Intelligent detection rules continuously evaluate application health and AI quality, automatically generating alerts and incidents enriched with contextual information so AI engineers can quickly diagnose and resolve issues before end users are affected.

How we built it

The platform consists of a lightweight Node.js backend that acts as an orchestration layer between the frontend and Gemini / Vertex AI. Each LLM request is instrumented to collect latency, token usage, prompt metadata, and confidence signals. This telemetry is streamed in real time to Datadog using custom metrics and logs, where dashboards and detection rules are defined. A React-based frontend provides a simple interface to interact with the LLM and visualize application health, while automated alerting ensures anomalies are immediately actionable.

Challenges we ran into

One of the main challenges was defining meaningful AI-quality signals, such as hallucination likelihood and confidence scoring, in a way that is both measurable and actionable. Balancing the level of telemetry collected without introducing performance overhead was another challenge. Additionally, mapping LLM-specific behaviors to traditional observability concepts like incidents and alerts required careful design to ensure the output remained useful for engineers rather than overwhelming them with noise.

Accomplishments that we're proud of

Built an end-to-end observability pipeline tailored specifically for LLM applications

Successfully integrated Gemini with Datadog for real-time AI telemetry

Designed actionable detection rules that generate context-rich incidents

Created clear dashboards that surface AI quality, performance, and cost signals

Demonstrated how LLM failures can be detected before impacting users

What we learned

We learned that observability for LLMs requires more than traditional metrics—it demands a deep understanding of AI behavior, uncertainty, and risk. Treating prompts and responses as first-class observability data unlocks powerful insights into reliability and cost control. We also learned the importance of designing alerts that are actionable, ensuring engineers can respond quickly rather than chasing ambiguous signals.

What's next for SentinelAI- Platform for LLM Applications

Next, we plan to expand SentinelAI with advanced hallucination detection models, role-based alerting, and automated remediation workflows. Future versions will include multi-model support, cost optimization recommendations, and security-focused features such as prompt injection detection and compliance auditing. Ultimately, SentinelAI aims to become a standard reliability layer for teams deploying LLM applications at scale.

Built With

Share this project:

Updates