GadgetInspector

Enter the configuration of your agent and it's state (prompt, history of conversations, ...)
Checkout the probabilities at each turn for any tool call
Inspect the attention map that drives the tool call choice
Example of a run part 3 - Result (Llama 3.1 MacOS)
Example of a run (Llama 3.1 MacOS)
Example of a run part 2 (Llama 3.1 MacOS)
Example of a run part 3 - Display Attention (Llama 3.1 MacOS)
Example of another run (Llama 3.1 MacOS)
Example of another run - Heatmap (Llama 3.1 MacOS)
Example of another run - Heatmap part 2 (Llama 3.1 MacOS)

Inspiration

AI agents are revolutionizing how we interact with software, but their decision-making process remains frustratingly opaque. When an agent chooses which tool to call at each step, developers are left guessing: Why did it pick that function? What would it have done differently? This black-box problem makes debugging nearly impossible and prevents us from truly understanding or improving agent behavior.

I was inspired by the need for transparency in AI systems, especially as agents become more autonomous and handle critical tasks. I envision a tool that could open up the decision-making process, showing not just what an agent does, but the entire landscape of possibilities at each turn and what influenced those choices.

What it does

GadgetInspector is a visualization tool that reveals how AI agents make decisions when selecting tools. It transforms opaque agent behavior into interactive, explorable probability trees.

Core Features:

OpenAI Playground-Style Configuration: Set up agents with custom system prompts, tools (File Search, Code Interpreter, custom functions), and model parameters. Configurations persist in local storage for easy reuse.
Conditional Probability Trees: Visualize tool selection probabilities across multiple conversation turns. The key innovation is that these are conditional probabilities, expanding a node shows you "if the agent calls this tool, what's likely to happen next?" Each path through the tree reveals different decision patterns.
Attention Heatmaps: Click any tool call to see which input tokens the model focused on when making that decision.

How I built it

Frontend Architecture:

TanStack Start with React 19 for the modern full-stack framework
React Flow for the interactive probability tree with zoom, pan, and expandable nodes
shadcn/ui with custom petrol blue theming for a polished, professional interface
Tailwind CSS 4 with CSS variables for consistent theming throughout
MLX for LLM inference on Apple Silicon

Challenges I ran into

The limited hardware at my disposal made me limit the performance of the model and therefore only stick to Llama 3.1 3B. The limited time also constrained me to do a basic sampling implementation to determine the tool call probability.

Accomplishments that I am proud of

I am happy to get some first results that enable to see the specific wording that can bias the model into choosing the wrong tool. the inspector is practical and can easily be enhanced in the future. Being able to get insights from this prototype is great!

What we learned

Claude Sonnet 4.5 is more capable than what I though, it was a great help for the model part. I am especially happy to be able to discover MLX for Apple Silicon inference.

What's next for GadgetInspector

Use the logprobs to determine the probability of tool calls instead of sampling
Use a better model to analyze the performance, and compare between models
Use a model compatible with OpenAI Agent format in context
Test on multi turn agents (for now only tested for 1 turn)
Improve UI and use all agent state for full reproducibility
Turn into a CLI to provide the tool to an agent for prompt auto-improvement

Built With

fastapi
mlx
python
react
shadcn

Updates

Ugo Balducci started this project — Nov 15, 2025 07:29 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.