AI Travel Genius

High Level Architecture
Dashboard: Tools section
Dashboard: Some Highlights which and provided context about the state of the LLM app
Dashboard: LLM Analysis
Dashboard: Traces
Automated Case with High Error Rate and context for the SRE
SLO Tool Reliability 1
Number of different cases encountered
Monitor: High LLM Response Latency which created automated case
Different SLOs used
4 custom monitors and all with cases
Overview of LLM App
Spans with multiple calls for LLM requests, tools and agents
Description of Trace
Failed tool call
Architecture of AI agents
AI recommendations
This Itinerary created with detailed planned activities

## ✨ Inspiration

Planning trips is complex—you need to research destinations, check weather, find hidden gems, and balance budgets. We wanted to create an AI assistant that could handle this complexity through **conversation**.

The **Datadog Challenge** made us ask: "What if we could see exactly how our AI agents work?" Traditional monitoring shows if an API is up or down, but doesn't reveal **which agent is thinking too slowly** or **why token costs are spiking**. We built **Travel Genius** to demonstrate that even complex multi-agent AI systems can be fully observable. While building and integrating datadog I was able to figure it out on the go what actually is happening behind the scenes and I happen to spot my biggest 

## 🚀 What It Does

**Travel Genius** is a travel planner powered by Google Gemini. Users describe their trip ("3-day cultural trip to Kyoto under $1500") with some inputs, and specialized AI agents work together to create a detailed itinerary.

### The Real Observability Story
1.  **Complete LLM Tracing**: Every user request generates a full trace showing agent orchestration, execution time, and token usage.
2.  **Professional Datadog Dashboard** with real-time success rates, token tracking, and latency percentiles.
3.  **Proactive Alerting System**: 3 Service Level Objectives (SLOs) and 4 intelligent monitors for errors, latency, tool failures, and token usage, case management with the context for the SRE to act on.
4.  **Safety & Quality Monitoring**: Basic prompt analysis for toxicity and injection attempts via custom tagging.

## 🔧 How We Built It

### Architecture

User Request → FastAPI → Google ADK Agents → Gemini API → Response ↓ ↓ ↓ ↓ ↓ [Datadog LLM Observability - Full Trace Capture & Metrics]


### Tech Stack
*   **Backend & Agents**: Python, FastAPI, Google ADK (Agent Development Kit), Gemini 2.0 Flash via Vertex AI.
*   **Observability**: Datadog LLM Observability SDK (`ddtrace`), manually instrumented spans.

### Core Observability Implementation
```python
# Datadog LLMObs Initialization
LLMObs.enable(
    ml_app="travel-genius-agents",
    api_key=os.getenv("DD_API_KEY"),
    site="us5.datadoghq.com",
    agentless_enabled=True
)

What We Instrumented:

End-to-end HTTP request/response cycles.
Google ADK sequential agent execution (weather_agent → itinerary_generator).
Custom tags for safety, user session, and prompt analysis.

💪 Challenges We Ran Into

Multi-Agent Tracing: Getting clear traces from Google ADK's sequential agents required careful configuration to move beyond a single "black box" span.
The Gemini Integration Gap: Datadog has no native integration for Gemini evaluations. We implemented a rule-based manual tagging system to classify prompts, which gave us more control.
SLO Configuration: Datadog's SLO interface initially rejected our queries. We discovered the correct metric names (ml_obs.trace.ok.count) through trial and error.
Time Pressure: With days to submit, we focused on core observability—SLOs, key monitors, and a clean dashboard—over every possible feature.

🏆 Accomplishments That We're Proud Of

What Actually Works

A Production-Grade Dashboard: A single-pane view for LLM health, tracking success rate, token usage, tool calls, and errors.
Meaningful SLOs: Three SLOs that protect real user experience:
- llm-latency: 99% of requests under target latency.
- tool-reliability: 97% success rate for external APIs (like weather).
- llm-availability: 99.9% availability target.
Intelligent, Actionable Alerting: Four monitors that alert on:
- LLM Error Rate > 10%
- P95 Latency > 10 seconds
- Tool Execution Failures
- Unusual Token Usage
Safety Monitoring Foundation: A functional system to tag and track problematic prompts.

Real Issues Caught During Testing

Identified the external weather API as the most likely failure point.
Spotted latency outliers causing poor user experience.
Verified token consumption remained within efficient ranges.

📚 What We Learned

Technical Insights

Observability-First is Faster: Instrumenting from the start made debugging agent logic 10x easier.
SLOs Require Pragmatism: Setting realistic targets (99% vs. 99.9%) is more valuable than perfection.
Manual Instrumentation Has Value: When auto-instrumentation falls short, manual spans provide superior clarity and control.

Hackathon Realities

A focused, working system beats an ambitious, incomplete one.
Documentation and demos are as crucial as the code itself.
Judges appreciate honesty about scope, challenges, and practical solutions.

🔮 What's Next for Travel Genius

Immediate Improvements

Integrate ElevenLabs for a true voice-native conversational interface.
Expand destination coverage and trip personalization.
Enhance the safety classifier with a more sophisticated model.

Observability Evolution

Connect frontend user actions to backend LLM traces using Datadog RUM.
Implement canary deployments validated by SLO performance.
Build automated regression testing with Datadog Synthetics.

The Vision: A fully observable, conversational AI travel assistant that users can trust, backed by an observability stack that makes every decision transparent.

📊 Submission Checklist

Working multi-agent travel planner (Google ADK + Gemini).
Comprehensive Datadog dashboard with 15+ metrics.
3 SLOs and 4 monitors with actionable alerts.
Safety monitoring via custom prompt tagging.
Clean code repository and deployment guide.
3-minute demo video showing real observability in action. ```

Built With

fastapi
google-adk
google-cloud
llm
mcp
nextjs
postgresql
vertex
weatherapi

Updates

Numan Nayeem started this project — Dec 31, 2025 03:34 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.