Inspiration
We've all been there. 2 AM, phone buzzing with PagerDuty alerts, "PRODUCTION INCIDENT — Checkout Service 500 Errors." You scramble out of bed, open Kibana for logs, Jaeger for traces, Grafana for metrics, Slack for context. Each tool shows a fragment of the truth, but nobody connects the dots for you. You're manually correlating timestamps across tabs, copy-pasting trace IDs, running the same queries over and over. Four hours later, you find the root cause: a connection pool exhaustion in the payment service that cascaded through three microservices. Four hours of manual detective work that could have been automated.
That's when it hit us: why isn't there an AI agent that actually investigates incidents for you? Not another dashboard. Not another chatbot that says "have you tried restarting the pod?" But a real agent that connects to your observability data, runs the right queries, follows the evidence trail, and tells you what's broken and why. We built SLEUTH because we were tired of being human correlation engines at 2 AM.
How We Built It
We started with a simple idea: an AI agent that can talk to Elasticsearch. But giving an AI agent real access to observability data turned out to be way more interesting than we expected.
The MCP Server. We built a custom Elastic MCP (Model Context Protocol) server and deployed it on Google Cloud Run. It exposes 6 tools: search_logs, search_traces, search_metrics, esql_query, index_management, and cluster_health. The game-changer is esql_query — Elastic's ES|QL pipelined query language lets you correlate logs, traces, and metrics in a single query. That's not possible with standard Elasticsearch REST APIs without multiple round trips. This is the most powerful way to give an AI agent access to observability data, and we're convinced of that.
The AI Reasoning. We use Gemini 2.5 Flash through Google Cloud Agent Builder. When an incident comes in, Gemini doesn't just search for keywords — it creates a structured investigation plan, decides which signals to query, interprets the results, and adapts its strategy based on what it finds. It's the difference between a search engine and an investigator.
The Real-Time UI. We built the frontend with Next.js 16 and Tailwind CSS 4, using Server-Sent Events to stream the investigation as it happens. You watch the agent plan, execute queries, discover findings, and build conclusions — all in real-time. It feels like watching a detective work a case, not like waiting for a loading spinner.
Graceful Degradation. This part saved us. We built 4 operational modes: full Agent Builder + Elasticsearch, Gemini API + Elasticsearch, heuristic planning + Elasticsearch, and Gemini-only fallback. When Agent Builder went down during demos (which happened), we didn't freeze — we degraded gracefully and kept showing real results. And we never, ever serve mock data. If Elasticsearch is unreachable, the app tells you honestly instead of faking results.
Challenges We Ran Into
Agent Builder + MCP integration. This was brutal. Google Cloud Agent Builder's Reasoning Engine would silently fail when we attached our MCP server — no error, no logs, just stuck in "deploying" forever. We spent two full days debugging this. The workaround was our degradation system: when Agent Builder can't connect to MCP, we fall back to calling Gemini directly and querying Elasticsearch ourselves. It works, but we really wanted the full Agent Builder + MCP integration to work natively.
No mock data was a hard rule. Early on, we decided: no fake results, ever. This made everything harder. We had to solve real Elasticsearch authentication, figure out ES|QL syntax edge cases, and handle cross-signal correlation with actual data schemas. Mock data would have been so much easier. But it would also have made SLEUTH a demo, not a tool. We're proud that every number you see in SLEUTH comes from a real Elasticsearch cluster.
SSE streaming with concurrent queries. Streaming investigation updates via Server-Sent Events while the agent runs multiple parallel Elasticsearch queries was tricky. Each investigation spawns 5-10 queries, and the results need to stream back incrementally in the right order while the agent keeps thinking. We ended up building a custom event pipeline that queues findings by confidence score and streams them as they're discovered.
What We Learned
MCP is early but powerful. The Model Context Protocol is still nascent, and integrating it with Agent Builder was painful. But the concept — giving AI agents structured, tooled access to real data sources — is the future. Once the integration kinks are worked out, this will be the standard way AI agents interact with enterprise systems.
Real data forces you to solve real problems. Every shortcut we didn't take came back as a real engineering challenge we had to solve. ES|QL syntax, Elasticsearch auth, cross-signal correlation — these are problems that matter in production. Mock data hides them. Real data forces you to confront them.
Always have a fallback. Our 4-mode degradation system was initially a workaround for Agent Builder issues. It ended up being one of the best architectural decisions we made. In a hackathon environment where services go down unexpectedly, being able to degrade gracefully means you always have something to demo.
ES|QL changes the game. Being able to pipe logs into traces into metrics in a single query is genuinely different from anything else in the observability space. It's the reason our MCP server is uniquely useful compared to a simple Elasticsearch wrapper.
Streaming makes agents feel real. The difference between waiting 30 seconds for a response and watching an agent think, plan, and discover in real-time is night and day. SSE streaming is what makes SLEUTH feel like a live investigator instead of a slow API call.
Built With
- elastic-mcp-server
- elasticsearch
- es|ql
- gemini-ai
- google-artifact-registry
- google-cloud-agent-builder
- google-cloud-build
- google-cloud-run
- javascript
- next.js
- node.js
- server-sent-events
- tailwind-css
- typescript
Log in or sign up for Devpost to join the conversation.