-
-
High Level Architecture
-
Dashboard: Tools section
-
Dashboard: Some Highlights which and provided context about the state of the LLM app
-
Dashboard: LLM Analysis
-
Dashboard: Traces
-
Automated Case with High Error Rate and context for the SRE
-
SLO Tool Reliability 1
-
Number of different cases encountered
-
Monitor: High LLM Response Latency which created automated case
-
Different SLOs used
-
4 custom monitors and all with cases
-
Overview of LLM App
-
Spans with multiple calls for LLM requests, tools and agents
-
Description of Trace
-
Failed tool call
-
Architecture of AI agents
-
AI recommendations
-
This Itinerary created with detailed planned activities
## ✨ Inspiration
Planning trips is complex—you need to research destinations, check weather, find hidden gems, and balance budgets. We wanted to create an AI assistant that could handle this complexity through **conversation**.
The **Datadog Challenge** made us ask: "What if we could see exactly how our AI agents work?" Traditional monitoring shows if an API is up or down, but doesn't reveal **which agent is thinking too slowly** or **why token costs are spiking**. We built **Travel Genius** to demonstrate that even complex multi-agent AI systems can be fully observable. While building and integrating datadog I was able to figure it out on the go what actually is happening behind the scenes and I happen to spot my biggest
## 🚀 What It Does
**Travel Genius** is a travel planner powered by Google Gemini. Users describe their trip ("3-day cultural trip to Kyoto under $1500") with some inputs, and specialized AI agents work together to create a detailed itinerary.
### The Real Observability Story
1. **Complete LLM Tracing**: Every user request generates a full trace showing agent orchestration, execution time, and token usage.
2. **Professional Datadog Dashboard** with real-time success rates, token tracking, and latency percentiles.
3. **Proactive Alerting System**: 3 Service Level Objectives (SLOs) and 4 intelligent monitors for errors, latency, tool failures, and token usage, case management with the context for the SRE to act on.
4. **Safety & Quality Monitoring**: Basic prompt analysis for toxicity and injection attempts via custom tagging.
## 🔧 How We Built It
### Architecture
User Request → FastAPI → Google ADK Agents → Gemini API → Response ↓ ↓ ↓ ↓ ↓ [Datadog LLM Observability - Full Trace Capture & Metrics]
### Tech Stack
* **Backend & Agents**: Python, FastAPI, Google ADK (Agent Development Kit), Gemini 2.0 Flash via Vertex AI.
* **Observability**: Datadog LLM Observability SDK (`ddtrace`), manually instrumented spans.
### Core Observability Implementation
```python
# Datadog LLMObs Initialization
LLMObs.enable(
ml_app="travel-genius-agents",
api_key=os.getenv("DD_API_KEY"),
site="us5.datadoghq.com",
agentless_enabled=True
)
What We Instrumented:
- End-to-end HTTP request/response cycles.
- Google ADK sequential agent execution (
weather_agent→itinerary_generator). - Custom tags for safety, user session, and prompt analysis.
💪 Challenges We Ran Into
- Multi-Agent Tracing: Getting clear traces from Google ADK's sequential agents required careful configuration to move beyond a single "black box" span.
- The Gemini Integration Gap: Datadog has no native integration for Gemini evaluations. We implemented a rule-based manual tagging system to classify prompts, which gave us more control.
- SLO Configuration: Datadog's SLO interface initially rejected our queries. We discovered the correct metric names (
ml_obs.trace.ok.count) through trial and error. - Time Pressure: With days to submit, we focused on core observability—SLOs, key monitors, and a clean dashboard—over every possible feature.
🏆 Accomplishments That We're Proud Of
What Actually Works
- A Production-Grade Dashboard: A single-pane view for LLM health, tracking success rate, token usage, tool calls, and errors.
- Meaningful SLOs: Three SLOs that protect real user experience:
-
llm-latency: 99% of requests under target latency. -
tool-reliability: 97% success rate for external APIs (like weather). -
llm-availability: 99.9% availability target.
-
- Intelligent, Actionable Alerting: Four monitors that alert on:
- LLM Error Rate > 10%
- P95 Latency > 10 seconds
- Tool Execution Failures
- Unusual Token Usage
- Safety Monitoring Foundation: A functional system to tag and track problematic prompts.
Real Issues Caught During Testing
- Identified the external weather API as the most likely failure point.
- Spotted latency outliers causing poor user experience.
- Verified token consumption remained within efficient ranges.
📚 What We Learned
Technical Insights
- Observability-First is Faster: Instrumenting from the start made debugging agent logic 10x easier.
- SLOs Require Pragmatism: Setting realistic targets (99% vs. 99.9%) is more valuable than perfection.
- Manual Instrumentation Has Value: When auto-instrumentation falls short, manual spans provide superior clarity and control.
Hackathon Realities
- A focused, working system beats an ambitious, incomplete one.
- Documentation and demos are as crucial as the code itself.
- Judges appreciate honesty about scope, challenges, and practical solutions.
🔮 What's Next for Travel Genius
Immediate Improvements
- Integrate ElevenLabs for a true voice-native conversational interface.
- Expand destination coverage and trip personalization.
- Enhance the safety classifier with a more sophisticated model.
Observability Evolution
- Connect frontend user actions to backend LLM traces using Datadog RUM.
- Implement canary deployments validated by SLO performance.
- Build automated regression testing with Datadog Synthetics.
The Vision: A fully observable, conversational AI travel assistant that users can trust, backed by an observability stack that makes every decision transparent.
📊 Submission Checklist
- Working multi-agent travel planner (Google ADK + Gemini).
- Comprehensive Datadog dashboard with 15+ metrics.
- 3 SLOs and 4 monitors with actionable alerts.
- Safety monitoring via custom prompt tagging.
- Clean code repository and deployment guide.
- 3-minute demo video showing real observability in action. ```
Built With
- fastapi
- google-adk
- google-cloud
- llm
- mcp
- nextjs
- postgresql
- vertex
- weatherapi
Log in or sign up for Devpost to join the conversation.