Inspiration

AI agents are becoming powerful enough to write code, use tools, inspect systems, and make multi-step decisions. But there is a serious problem: when an agent fails, leaks sensitive context, calls the wrong tool, or gives a vague result, it is often hard to prove what actually happened.

I built TracePilot because agent builders need more than a chatbot. They need observability, trust, debugging, and shareable proof. For hackathons, open-source work, and real development teams, it should be easy to answer:

  • What did the Gemini CLI agent do?
  • Which tools or steps were involved?
  • Was sensitive information protected?
  • Can I export a safe proof of the run?
  • Can the agent use its own trace data to become easier to debug?

TracePilot was inspired by this need for safer, more inspectable Gemini CLI agents using Arize/Phoenix MCP.

What it does

TracePilot is a Windows-first add-on for the official Gemini CLI that makes agent workflows observable and easier to trust.

It adds:

  • Phoenix/Arize MCP integration so Gemini CLI can connect to Phoenix trace and observability data.
  • Redacted local event logs for Gemini CLI sessions, model events, tool events, and agent events.
  • TracePilot commands such as /tracepilot:doctor, /tracepilot:status, /tracepilot:logs, and /tracepilot:export-proof.
  • Sanitized proof exports that can be shared without exposing API keys, bearer tokens, private keys, or other secrets.
  • A hosted Cloud Run install site that serves the installer and Windows bundle while keeping secrets local on the user’s machine.

The goal is not just to answer questions. The goal is to help developers run Gemini CLI agents with more confidence, inspect failures, verify behavior, and produce safe evidence of what happened.

How we built it

TracePilot was built as a lightweight developer tool around Gemini CLI, Node.js, PowerShell, Arize/Phoenix MCP, and Google Cloud Run.

The main parts are:

  • A PowerShell installer that installs TracePilot locally and merges Gemini CLI MCP/hook settings.
  • A Phoenix MCP wrapper that launches the Arize Phoenix MCP server through npx.
  • A Gemini CLI hook logger that listens to agent lifecycle events and writes redacted local JSONL logs.
  • A shared redaction layer that removes common secrets such as API keys, bearer tokens, GitHub tokens, database URLs, private keys, and sensitive environment values.
  • A command runner that powers /tracepilot:doctor, /tracepilot:status, /tracepilot:logs, and /tracepilot:export-proof.
  • A static hosted website deployed on Cloud Run for installation and download.

The architecture keeps secrets local by design. The website does not collect Phoenix keys. Users enter Phoenix settings locally in PowerShell, and TracePilot stores them under the user’s .tracepilot directory.

Challenges we ran into

One challenge was making the project useful without turning it into a heavy platform. TracePilot had to stay small enough for quick setup, but still prove meaningful integration with Gemini CLI and Phoenix MCP.

Another challenge was secret safety. Agent logs can accidentally contain sensitive data, so redaction had to be treated as a core feature, not an afterthought. The logging system was designed to summarize and hash inputs while removing common secret patterns.

Windows setup was also a challenge. Developer tools often assume Unix-like environments, but many builders use Windows. TracePilot focuses on a Windows-first install path with PowerShell commands, local config files, and Gemini CLI settings integration.

The final challenge was positioning. TracePilot is not a normal chatbot app. It is an observability and proof layer for Gemini CLI agents, so the demo needed to clearly show how it helps real agent builders debug, verify, and trust agent behavior.

Accomplishments that we're proud of

I am proud that TracePilot connects several important pieces into a practical workflow:

  • Gemini CLI as the agent runtime.
  • Arize/Phoenix MCP as the partner MCP integration.
  • Local redacted logs for safer debugging.
  • Proof exports for sharing evidence of agent runs.
  • Cloud Run hosting for a simple install experience.
  • A security-conscious design where secrets stay on the user’s machine.

I am also proud that the project is small, inspectable, and open-source. Judges and developers can look at the implementation and understand how the installer, MCP wrapper, hooks, logs, and proof export work together.

Most importantly, TracePilot solves a real problem for agent builders: it makes agent behavior easier to inspect instead of treating the agent as a black box.

What we learned

This project taught me that building useful agents is not only about prompts and models. Real agent systems need observability, safety, proof, and recovery paths.

I learned that MCP is powerful because it lets agents access specialized capabilities at runtime. In this project, Phoenix MCP gives the agent a path toward trace introspection and better debugging.

I also learned that security and developer experience must be designed together. A tool can be powerful, but if it asks users to paste secrets into the wrong place or produces unsafe logs, it becomes risky. TracePilot tries to avoid that by keeping credentials local and redacting logs by default.

Finally, I learned that agent tooling itself can be a real-world agent problem. Before teams can trust agents with important work, they need tools that help them see what the agent did and why.

What's next for TracePilot

Next, I want to make TracePilot more useful as a self-improvement loop for Gemini CLI agents.

Planned improvements include:

  • Deeper Phoenix trace querying from inside Gemini CLI.
  • Better summaries of failed or risky agent runs.
  • Evaluation commands that score trace quality and agent reliability.
  • More structured proof exports for hackathon judging and team reviews.
  • A stronger dashboard for browsing local sessions.
  • Support for more operating systems beyond the Windows-first path.
  • Optional GitHub issue or pull request summaries based on exported trace evidence.
  • More automated tests around installer behavior, redaction, and MCP readiness.

The long-term vision is for TracePilot to become a trust layer for local agent development: a simple way to run Gemini CLI agents, observe what happened, protect secrets, and improve the next run.

Built With

  • apache-2.0
  • arize/phoenix-mcp
  • css
  • docker
  • gemini-cli
  • google-cloud-build
  • google-cloud-run
  • html
  • javascript-es-modules
  • local-json-config
  • node.js
  • npm/npx
  • powershell
  • redacted-jsonl-logs
Share this project:

Updates