๐ SentinelLLM โ Project Story
๐ง Inspiration
Large Language Models are rapidly becoming core infrastructure in modern applications โ from copilots to autonomous agents.
Yet, while model capabilities have evolved dramatically, observability has not.
Today, when an LLM misbehaves, becomes slow, costs spike, or a malicious prompt slips through, engineers are often blind. Traditional monitoring tools show CPU or memory usage, but nothing about prompts, tokens, or model behavior.
We built SentinelLLM because we believe:
If LLMs are production systems, they deserve production-grade observability.
๐ What it does
SentinelLLM is a production-ready gateway that sits between users and an LLM (Gemini via Vertex AI) and provides deep, LLM-aware observability and security using Datadog APM.
It enables teams to:
- Observe every LLM request end-to-end
- Measure latency, tokens, and cost
- Detect prompt injection and PII risks
- Debug failures with full distributed traces
- Turn LLM behavior from a black box into actionable signals
All without modifying application business logic.
๐ ๏ธ How we built it
SentinelLLM is built as a FastAPI gateway with production-grade instrumentation.
Architecture Overview
Client โ SentinelLLM Gateway (FastAPI) โโ Security analysis (prompt injection, PII) โโ Gemini 2.0 inference (Vertex AI) โโ Token & latency measurement โโ Datadog APM instrumentation โ Datadog Traces & Metrics
Key Technologies
- Google Cloud Vertex AI โ Gemini 2.0 models
- Datadog APM โ Host-based APM with real traces
- FastAPI โ High-performance Python backend
- ddtrace-run โ Automatic APM instrumentation
- Docker โ Local Datadog Agent for telemetry ingestion
Every /generate request is traced end-to-end, allowing engineers to inspect exactly how the model behaved for each prompt.
โ๏ธ Challenges we ran into
1๏ธโฃ LLMs donโt fit traditional monitoring
Standard metrics donโt capture:
- Prompt complexity
- Token consumption
- Model-specific latency
We had to design LLM-native telemetry signals instead of generic infrastructure metrics.
2๏ธโฃ Gemini 2.0 API changes
Gemini 2.0 requires structured input formats, unlike earlier versions.
Plain string prompts fail silently.
We refactored our inference layer to use structured content payloads, ensuring compatibility with the latest models.
3๏ธโฃ Observability boundaries were non-obvious
We initially gated telemetry inside the application using API keys โ which is incorrect for OTLP-based systems.
We learned the agent is the security boundary, not the app.
Fixing this unlocked seamless Datadog trace ingestion.
๐ Accomplishments that we're proud of
- โ
Real Gemini 2.0 inference (no mocks)
- โ
Live Datadog APM traces
- โ
Host-based instrumentation
- โ
End-to-end request visibility
- โ
Production-grade failure handling
- โ
Security-first design mindset
Most importantly, SentinelLLM is not a demo toy โ it behaves like real infrastructure.
๐ What we learned
- Observability for AI systems must be model-aware
- LLM cost and latency are first-class production concerns
- Datadog APM is powerful when used beyond basic metrics
- AI systems require the same rigor as distributed microservices
- Good observability changes how teams design systems
๐ฎ What's next for SentinelLLM
SentinelLLM is just the beginning.
Next steps include:
- ๐ Advanced prompt anomaly detection
- ๐ฐ Cost-based alerting and budgets
- ๐ Multi-model support (Claude, GPT, open-source LLMs)
- ๐ Custom Datadog dashboards for AI teams
- ๐ง Automatic incident summaries for LLM failures
Our vision is to make LLM observability a default, not an afterthought.
๐งญ Final Thought
You canโt secure what you canโt observe.
You canโt scale what you canโt understand.
SentinelLLM makes LLMs observable, secure, and production-ready.
Log in or sign up for Devpost to join the conversation.