Aethelgard: The LLM Surgeon

💡 The Spark

It started with a simple realization: An AI can destroy a company's reputation while returning a "200 OK" status code.

As we watched GenAI get adopted everywhere, we realized that traditional DevOps tools are blind to the meaning of text. They monitor latency and errors, but they don't know if your chatbot is helpful, hallucinating, or teaching a user how to commit tax fraud.

We asked ourselves: "What if we could give our observability platform a brain?" We wanted to build an immune system for LLMs—a "Surgeon" that actively watches the AI's thoughts and flags danger in real-time on our Datadog dashboard.

🩺 What it does

Project Aethelgard acts as a semantic firewall for Enterprise AI. It doesn't just log requests; it diagnoses them.

The Patient (Gemini Pro): This is the user-facing bot. It answers questions (sometimes dangerously).
The Surgeon (Gemini Flash): This is our "Audit Model." It runs in the background, analyzing every response the Patient generates. It assigns a Risk Score (0-100) based on safety, legality, and hallucination metrics.
The Monitor (Datadog): We stream this Risk Score live via UDP to Datadog. This transforms abstract "Safety" into a concrete, graphable metric just like CPU usage.

⚙️ How we built it

We built the backend using FastAPI for high-performance async handling.

The Intelligence: We used Google Vertex AI. The user talks to gemini-1.5-pro (for quality), but the "Surgeon" uses gemini-1.5-flash with a temperature of 0. This makes the Surgeon fast, cheap, and extremely strict.
The Nervous System: We integrated the Datadog Agent locally. We utilized the dogstatsd library to send custom metrics:
- llm.surgeon.risk_score (Gauge)
- llm.surgeon.latency (Histogram)
- llm.surgeon.errors (Counter)
The Interface: We built a Streamlit frontend with a dedicated "Attack Mode." This allows judges to intentionally try to jailbreak the AI to see the protection mechanism kick in live.

🚧 Challenges we ran into

The biggest headache was Latency. Adding a "second opinion" to every API call initially doubled our response time. We solved this by switching the Surgeon to Gemini Flash, which cut the audit time down significantly without losing accuracy.

We also struggled with Prompt Engineering the Surgeon. At first, our "Surgeon" was too nice—it would let mild hallucinations slide. We had to rewrite the system instructions to make it a "paranoid, strict auditor" before it reliably caught our jailbreak attempts.

🏆 Accomplishments that we're proud of

The "Spike": The best moment was running our traffic_generator.py script and watching the Datadog dashboard light up. Seeing the "Risk Score" graph spike exactly when we simulated an attack was a massive "It works!" moment.
Full Compliance: We successfully implemented all of Datadog's hackathon requirements: Vertex AI integration, Custom Metrics, Detection Rules, and a functioning Dashboard.
Resilience: The system doesn't crash even if the AI refuses to answer; it gracefully degrades and logs the error.

🧠 What we learned

Observability is Creative: We learned that "Metrics" aren't just for hardware. You can turn concepts (like Danger or Toxicity) into metrics if you have the right architecture.
The Agent is Powerful: Digging into datadog.yaml and configuring the local agent gave us a much deeper respect for how enterprise telemetry actually works.
Small Models Rule: You don't need a massive model to check the work of another model. Small, fast models are perfect for the "Supervisor" role.

🚀 What's next for Aethelgard

The Kill Switch: Right now, we alert on high risk. The next step is to block the response entirely before it reaches the user if the score > 90.
Vector Memory: Implementing RAG (Retrieval Augmented Generation) to check answers against a trusted knowledge base, not just general safety rules.
PagerDuty Integration: Sending a wake-up call to the on-call engineer the moment the AI starts hallucinating.

Built With

datadog
fastapi
gemini
google-cloud
llm
python
streamlit
vertex

Updates

Vinayak Kamat started this project — Dec 31, 2025 08:46 AM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.