LLM Governance Monitor

LLM Governance Monitor in STANDARD MODE - real-time metrics: latency, tokens, cost, health score
STRICT MODE activated - unsafe request blocked, threat neutralized without API cost
Datadog Dashboard - monitoring tokens, cost, latency, safety score, and health score
Datadog Event Management - automatic incident creation with context for engineers
Five custom Datadog monitors: High Latency Alert, Anomaly Detection with ML, Low Safety Score, High Token Usage, and Critical Health Score
Real-time incident notifications -Threat Blocked and STRICT MODE Activated events with severity levels, enabling immediate engineer response

Inspiration

LLM applications are becoming critical to enterprise operations, but traditional monitoring solutions only watch problems happen — they don't prevent them. I wanted to build something that actively protects LLM infrastructure, not just observes it.

What it does

LLM Governance Monitor is an active, self-healing observability solution for LLM applications. It implements a closed-loop governance architecture that:

Detects threats in real-time (prompt injection, jailbreaking attempts)
Decides on appropriate response (allow or block)
Acts autonomously to neutralize threats before they reach the LLM
Adapts its security posture based on observed patterns

The system operates in two modes:

STANDARD MODE: Monitors and alerts on suspicious activity
STRICT MODE: Automatically blocks unsafe requests, saving API costs

Key Features

Real-time metrics: latency, tokens, cost, safety score, health score
Active Governance Engine with automatic threat blocking
Self-healing architecture with automatic mode escalation
5 Datadog monitors including ML-based anomaly detection
Event logging with actionable incidents for engineers
Cost optimization: blocked requests don't call the API

How I built it

Backend: Python with FastAPI for high-performance async API
AI Model: Google Vertex AI with Gemini 2.0 Flash
Hosting: Google Cloud Run for serverless deployment
Monitoring: Datadog APM, Metrics API, and Events API
Frontend: HTML/JavaScript with Tailwind CSS

Challenges I faced

Implementing the governance state machine that transitions between STANDARD and STRICT modes
Calculating a unified Health Score that combines performance, cost, safety, and reliability
Sending custom metrics and events to Datadog in real-time
Balancing security (blocking threats) with usability (not blocking legitimate requests)

What I learned

How to implement closed-loop governance for AI systems
Datadog's Metrics API and Event Management for custom observability
The importance of actionable incidents over simple alerts
How to build self-healing architectures that adapt autonomously

What's next

Add semantic safety analysis using Vertex AI's safety settings
Implement ECONOMY mode for cost optimization with token truncation
Add user-level tracking for abuse detection
Integrate with Datadog Incident Management for automated runbooks

Built With

ai
cloud
css
datadog
docker
fastapi
gemini
google
python
run
tailwind
vertex

Updates

Private user started this project — Dec 30, 2025 07:07 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.