Inspiration

Every AI team has the same nightmare: your LLM costs spike 500% overnight, users complain about slow responses, and you have no idea why. We've been there.

LLM applications are black boxes. Unlike traditional software where you can trace a request through logs and metrics, LLMs produce unpredictable outputs at unpredictable costs. When things go wrong, you're flying blind.

We asked ourselves: What if LLM observability was as intuitive as traditional APM?

That question led to Sentinel an AI-powered observability platform that brings transparency to the black box.

What it does

Sentinel wraps your LLM interactions and provides:

16+ metrics per request - tokens, costs, latency, quality indicators Z-score anomaly detection - statistical analysis against rolling baselines AI-powered root cause analysis - Gemini explains why something went wrong Automated Datadog incidents - with severity levels and actionable recommendations Real-time dashboard - premium UI showing live metrics, anomalies, and incidents When latency spikes or costs explode, Sentinel detects it in seconds, analyzes the root cause using AI, and creates an incident in Datadog, all before your users notice.

How we built it

Architecture:

User Request → FastAPI Server → Gemini API ↓ Metrics Collector (16 metrics) ↓ Datadog Telemetry (real-time streaming) ↓ Anomaly Detector (Z-score analysis) ↓ Root Cause Analyzer (Gemini AI) ↓ Incident Creator (Datadog Incidents API) Tech Stack:

Backend: Python, FastAPI, Uvicorn AI: Google Gemini 2.0 Flash (chat + root cause analysis) Observability: Datadog (metrics, incidents, events) Detection: Custom Z-score anomaly detection with EWMA baselines Frontend: Vanilla HTML/CSS/JS no React, no bloat Deployment: Google Cloud Run, Docker Key Design Decisions:

No dependencies on LLM frameworks - Works with any LLM, not just LangChain Statistical detection over rules - Z-scores adapt to your traffic patterns AI explains AI - Gemini analyzes why your Gemini calls failed Accessibility-first dashboard - Magnification, TTS, high contrast modes

Challenges we ran into

  1. Datadog Incidents API requires "unstable operations" The Incidents API is marked unstable, requiring special configuration. We had to enable unstable_operations in the Datadog client, something not well documented.

  2. Async/sync function mismatches Our root cause analyzer and incident creator are synchronous, but FastAPI is async. Mixing await with sync functions caused silent failures. We refactored all integrations to handle this correctly.

  3. Anomaly detection needs baseline data Z-score detection requires 30+ data points before it can detect anomalies. For demos, we built a "synthetic trigger" that injects fake anomalies for immediate demonstration.

  4. Making the demo compelling in 3 minutes The biggest challenge wasn't technical , it was storytelling. We added a "Live Demo Zone" to the dashboard that walks through the entire workflow step-by-step with animated visuals.

Accomplishments that we're proud of

✅ 16 metrics captured per LLM request ✅ Real incidents created in Datadog (not mocked) ✅ AI-generated root cause analysis using Gemini ✅ Premium dashboard with accessibility features ✅ Deployed on Cloud Run — live and working

What we learned

Observability is harder for AI - LLMs don't return error codes. "Bad" responses are subjective. We had to invent new metrics like "refusal detection" and "truncation detection." Datadog's API is powerful but complex - The Incidents v2 API has many moving parts. The Events API was our fallback hero. Z-scores are elegant - A simple statistical method outperforms complex ML for anomaly detection when you have streaming data. Accessibility matters - Adding magnification and TTS took 2 hours but made the product usable by everyone.

What's next for Sentinel

Multi-model support - OpenAI, Anthropic, Cohere Cost forecasting - Predict monthly LLM spend Prompt optimization - Detect inefficient prompts automatically Open-source release - Production-ready package on PyPI

Built With

Share this project:

Updates