Inspiration
I was inspired by the rapid adoption of autonomous AI agents and a glaring problem: agents are often black boxes that make static decisions. When a tool fails or an endpoint gets slow, standard agents stubbornly keep trying the same paths. We wanted to build an agentic system that actually learns from its own past mistakes, optimizing its routing dynamically for cost, latency, and reliability.
What it does
TracePilot is a self-optimizing multi-agent platform. Instead of relying on hardcoded logic, TracePilot introduces the concept of an Economic Memory. When a user submits a query, TracePilot routes it to the optimal tool based on historical confidence scores. But the real magic happens in the background: TracePilot utilizes a background Auditor Agent that connects to the Arize Phoenix MCP Server. It reads its own observability traces, discovers hidden tool failures or high latencies, and dynamically updates its Economic Memory to penalize unreliable tools. Finally, an LLM Jury Agent evaluates the traces to ensure the final output actually answered the user's question.
How we built it
I built TracePilot using a cutting-edge, async-first Python stack:
- Backend: FastAPI powered by Python 3.10 and
anyiofor high-performance async task execution. - Agent Framework: Google Agent Development Kit (ADK) paired with Gemini 2.5 Flash for both the query routing and the background Auditor/Jury agents.
- Observability & MCP: I integrated OpenTelemetry and the Arize Phoenix MCP Server. The Auditor agent literally uses the MCP protocol to query the Phoenix SQLite backend to read its own traces.
- Storage: A lightweight SQLite engine powers the Economic Memory, recalculating weights dynamically via a token-bucket penalty algorithm.
- Deployment: Containerized with Docker and deployed entirely serverless on Google Cloud Run.
Challenges we ran into
Integrating MCP over stdio within a nested async FastAPI event loop was incredibly challenging. I faced complex anyio TaskGroup exceptions when the background Auditor agent attempted to spawn the Phoenix MCP subprocess while handling incoming HTTP requests. I had to carefully decouple the ADK session manager and silence internal logging conflicts to keep the JSON-RPC streams pristine.
Accomplishments that we're proud of
I successfully built a true meta-agent architecture. It’s incredibly satisfying to watch the Auditor Agent read its own Phoenix traces, realize a specific tool failed, and autonomously rewrite the routing rules so that the next query instantly chooses a better path.
What we learned
I learned the sheer power of the Model Context Protocol (MCP). By exposing observability data as an MCP tool, I have bridged the gap between passive logging and active agentic reasoning. I also deepened my understanding of building robust, multi-agent systems with Google ADK.
What's next for TracePilot
- Multi-Tenant Memory: Scaling the Economic Memory to learn tool reliability across isolated organizations.
- Advanced Trace Analytics: Upgrading the Auditor Agent to detect cost anomalies (e.g., recursive tool loops) directly from OpenTelemetry span trees.
- Agent-to-Agent Negotiation: Allowing the Jury Agent to automatically trigger a retry if the evaluation score drops below a certain threshold.
Log in or sign up for Devpost to join the conversation.