-
-
Enter the configuration of your agent and it's state (prompt, history of conversations, ...)
-
Checkout the probabilities at each turn for any tool call
-
Inspect the attention map that drives the tool call choice
-
Example of a run part 3 - Result (Llama 3.1 MacOS)
-
Example of a run (Llama 3.1 MacOS)
-
Example of a run part 2 (Llama 3.1 MacOS)
-
Example of a run part 3 - Display Attention (Llama 3.1 MacOS)
-
Example of another run (Llama 3.1 MacOS)
-
Example of another run - Heatmap (Llama 3.1 MacOS)
-
Example of another run - Heatmap part 2 (Llama 3.1 MacOS)
Inspiration
AI agents are revolutionizing how we interact with software, but their decision-making process remains frustratingly opaque. When an agent chooses which tool to call at each step, developers are left guessing: Why did it pick that function? What would it have done differently? This black-box problem makes debugging nearly impossible and prevents us from truly understanding or improving agent behavior.
I was inspired by the need for transparency in AI systems, especially as agents become more autonomous and handle critical tasks. I envision a tool that could open up the decision-making process, showing not just what an agent does, but the entire landscape of possibilities at each turn and what influenced those choices.
What it does
GadgetInspector is a visualization tool that reveals how AI agents make decisions when selecting tools. It transforms opaque agent behavior into interactive, explorable probability trees.
Core Features:
OpenAI Playground-Style Configuration: Set up agents with custom system prompts, tools (File Search, Code Interpreter, custom functions), and model parameters. Configurations persist in local storage for easy reuse.
Conditional Probability Trees: Visualize tool selection probabilities across multiple conversation turns. The key innovation is that these are conditional probabilities, expanding a node shows you "if the agent calls this tool, what's likely to happen next?" Each path through the tree reveals different decision patterns.
Attention Heatmaps: Click any tool call to see which input tokens the model focused on when making that decision.
How I built it
Frontend Architecture:
- TanStack Start with React 19 for the modern full-stack framework
- React Flow for the interactive probability tree with zoom, pan, and expandable nodes
- shadcn/ui with custom petrol blue theming for a polished, professional interface
- Tailwind CSS 4 with CSS variables for consistent theming throughout
- MLX for LLM inference on Apple Silicon
Challenges I ran into
The limited hardware at my disposal made me limit the performance of the model and therefore only stick to Llama 3.1 3B. The limited time also constrained me to do a basic sampling implementation to determine the tool call probability.
Accomplishments that I am proud of
I am happy to get some first results that enable to see the specific wording that can bias the model into choosing the wrong tool. the inspector is practical and can easily be enhanced in the future. Being able to get insights from this prototype is great!
What we learned
Claude Sonnet 4.5 is more capable than what I though, it was a great help for the model part. I am especially happy to be able to discover MLX for Apple Silicon inference.
What's next for GadgetInspector
- Use the logprobs to determine the probability of tool calls instead of sampling
- Use a better model to analyze the performance, and compare between models
- Use a model compatible with OpenAI Agent format in context
- Test on multi turn agents (for now only tested for 1 turn)
- Improve UI and use all agent state for full reproducibility
- Turn into a CLI to provide the tool to an agent for prompt auto-improvement
Log in or sign up for Devpost to join the conversation.