InspirationInspiration
As developers building with Large Language Models (LLMs), we kept running into the same frustration: The "Black Box" problem.
We noticed that while modern LLM applications are powerful, they often operate in the dark. When an agent failed, we didn't know why. When costs spiked, we found out a month later. When users experienced latency, we had no idea if it was the model or our code.
We wanted to move beyond just "building a chatbot" to building a production-grade AI system. We were inspired to prove that AI agents can be reliable, transparent, and enterprise-ready. We wanted to build an application where every "thought" the AI has is traceable, every dollar spent is accounted for in real-time, and every error triggers an immediate alert.
What it does AI Travel Agent is a dual-purpose platform:
For Travelers: It is an intelligent travel planning assistant. Users can ask complex natural language questions like "Find me flights and 5-star hotels in Tokyo for next March." The agent autonomously orchestrates real-time searches for flights and hotels, aggregates the data, and delivers a beautifully formatted HTML email summary with booking options.
For DevOps & Engineers: It is a showcase of Enterprise Observability. It provides a "Glass Box" view into the AI's operations. Using Datadog, we track:
Agent Reasoning: We trace the exact decision path the agent takes (e.g., "Why did it search for hotels before flights?").
Cost & Tokens: We monitor token usage per query to prevent budget overruns.
Performance: We track P95 latency and tool failure rates.
Reliability: We implemented automated detection rules that alert us before users even report a bug.
How we built it We architected the solution using a modern, scalable stack deployed on Google Cloud Run:
The Brain (AI): We used Google's Gemini 2.5 Pro via LangGraph. LangGraph allowed us to build a stateful, cyclic graph where the agent can "loop" through reasoning steps, retry failed tool calls, and maintain conversation context.
The Hands (Tools): We integrated SerpAPI to give the agent real-time access to Google Flights and Google Hotels data. We also built a custom SMTP node for email delivery.
The Eyes (Observability): This was the core of our challenge. We deeply instrumented the application using Datadog:
APM & Tracing: We wrapped our LangGraph nodes to visualize the entire distributed trace from the user click to the external API call.
Custom Metrics: We defined metrics for token_usage, estimated_cost, and tool_failure_rate.
Monitors: We set up 5 critical detection rules (e.g., High Latency, Cost Spikes) using Datadog Monitors.
The Interface: We built a clean, responsive web app using Streamlit to ensure a smooth user experience.
Challenges we ran into Tracing Non-Deterministic Logic: Unlike traditional apps, AI agents don't follow a linear path. They loop and jump based on reasoning. Mapping LangGraph's cyclic execution to Datadog's linear trace view required creative instrumentation to ensure the "parent-child" relationships of the spans made sense.
Real-Time Cost Calculation: Token counts vary wildly by query. Building an anomaly detection monitor that distinguishes between a "complex query" and a "cost spike" took fine-tuning of our Datadog alert thresholds.
Tool Coordination: Getting the agent to intelligently parse complex JSON data from flight APIs without hallucinating or running out of context tokens was difficult. We had to implement strict output parsing and error handling within the tool nodes.
Accomplishments that we're proud of The "Glass Box" Success: We achieved 100% visibility. We can pinpoint the exact moment an agent decided to call a tool and see the raw API response it received.
Metric-Driven Results: Our observability strategy reduced our Mean Time To Resolution (MTTR) from an estimated 30 minutes to just 9 minutes.
Proactive Safety: We built a system that catches issues before they affect users. Our "Cost Anomaly" monitor successfully detects if token usage spikes 200% above baseline, potentially saving hundreds of dollars in API fees.
Availability: We achieved a 99.3% success rate and validated it against our Datadog SLOs.
What we learned Observability is UX: Performance metrics (latency) are directly tied to user trust. If the AI takes 20 seconds to "think," the user leaves. visualising this led us to optimize our prompts.
Prompt Engineering is Engineering: You can't improve what you don't measure. By tracking token usage and response quality, we turned prompt tweaking from an art into a data-driven science.
State Management matters: Using LangGraph's checkpointer was crucial for maintaining context, and tracing that state persistence gave us confidence in the agent's memory.
What's next for HackDog (The Bounty Winner) Booking Integration: Currently, we provide search results. The next step is integrating transactional APIs to allow users to book flights directly within the chat interface. AI Operators
User Personalization: We plan to implement a vector database to store user preferences (e.g., "I prefers aisle seats" or "I am vegan"), allowing the agent to filter results proactively.
Voice Interface: Adding a speech-to-text layer to allow users to plan trips hands-free while driving or walking.
Built With
- dashboards)-apis-&-tools:-serpapi-(google-flights-&-hotels)
- docker
- gemini-api-frameworks:-langgraph
- google-cloud-run
- google-secret-manager-observability:-datadog-(apm
- langchain
- languages-&-core:-python
- logs
- metrics
- monitors
- poetry-ai-&-models:-google-gemini-2.5-pro
- smtp
- streamlit-cloud-&-deployment:-google-cloud-platform-(gcp)
Log in or sign up for Devpost to join the conversation.