Realtime Agent Analysis Dashboard (R.A.A.D)

Here's our Raad CLI in action!
Here's a snapshot of our dashboard
Red teaming capabilities
Architectural Diagram

Inspiration

Some of the largest challenges facing AI development today are traceability and security. As software engineers, we felt that enabling other engineers to put traceability and security top of mind was a meaningful challenge we wanted to tackle. We saw how easy it was to build agents with frameworks like LangChain and Semantic Kernel, but wondered..what if it were that easy to trace and test agents? So inspired by tools like Azure's Application Insights, and other tools like Storybook and Swagger, we came up with the idea for RAAD! Which stands for Realtime Agentic Analytics Dashboard

What it does

The main idea behind RAAD is to give developers a drop-in in easy-to-use to use way to monitor and evaluate the AI agents they build on their machines. RAAD provides developers with a CLI tool that will spin up all local servers needed to facilitate MCP, A2A, and our evaluation agent powered by Nvidia's nemotron mini model.

Our evaluation agent is responsible for automating red teaming, or sending adversarial prompts to developer agents in order to ensure the agents respond to the bad prompts the correct way. This agent leverages A2A to connect and prompt test agents, aggregates and analyzes the responses, and finally sends that to the users' dashboard, all running locally on their machine!

How we built it

Shell script to be able to run 'raad start'
Python decorator to provide discovery details to the front end so users can see locally running agents (Future)
Locally running agents connect to the evaluator agent powered by Nemotron mini via the A2A protocol
Nemotron mini model locally hosted using ollama
The evaluator agent leverages MCP to call the sentiment analysis tool to evaluate agent responses
The evaluator agent also has tools to write responses back to the dashboard to be reviewed by the developer

Challenges we ran into

A2A and MCP were particularly tricky to pin down. After spending some time with both protocols, we definitely understand them more; however, there are some parts that are definitely still a bit murky.

In addition to that, I think the biggest issue we had was with our model performance. The mini model runs a bit slow, and we believe it's due to the volume and sequential nature of the prompts and responses we're sending to the test agent. A plan of ours is to leverage concurrent prompting as well as multi-agent architectures to improve performance in the future

Accomplishments that we're proud of

Even though it's slow, getting a function MCP, A2A setup to run locally was a huge accomplishment. There were times when trying to figure out how all these pieces were going to come together was difficult, but in the end, we're very happy with what we built and what we learned.

We're also proud of the name; it's pretty radical!

What we learned

Building it is half the battle; we spent lots of time trying to optimize for performance. In the future, that's something we want to think about earlier. Especially if working with local models.
A2A and MCP aren't so scary!
Learned more about Nvidia's model offerings and some of their capabilities

What's next for Realtime Agent Analysis Dashboard (R.A.A.D)

We would like to open-source this project at some point in the future because we believe other developers would benefit
Make setup and the overall dev experience smoother
Add more MCP tools and experiment with mult-agent architectures
Encrich the logging and look into a cloud offering

Built With

a2a
fastapi
langchain
mcp
nvidia-nemotron
ollama

Updates

Jason Hutchinson II started this project — Nov 08, 2025 11:42 PM EST

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.