Inspiration
Some of the largest challenges facing AI development today are traceability and security. As software engineers, we felt that enabling other engineers to put traceability and security top of mind was a meaningful challenge we wanted to tackle. We saw how easy it was to build agents with frameworks like LangChain and Semantic Kernel, but wondered..what if it were that easy to trace and test agents? So inspired by tools like Azure's Application Insights, and other tools like Storybook and Swagger, we came up with the idea for RAAD! Which stands for Realtime Agentic Analytics Dashboard
What it does
The main idea behind RAAD is to give developers a drop-in in easy-to-use to use way to monitor and evaluate the AI agents they build on their machines. RAAD provides developers with a CLI tool that will spin up all local servers needed to facilitate MCP, A2A, and our evaluation agent powered by Nvidia's nemotron mini model.
Our evaluation agent is responsible for automating red teaming, or sending adversarial prompts to developer agents in order to ensure the agents respond to the bad prompts the correct way. This agent leverages A2A to connect and prompt test agents, aggregates and analyzes the responses, and finally sends that to the users' dashboard, all running locally on their machine!
How we built it
- Shell script to be able to run 'raad start'
- Python decorator to provide discovery details to the front end so users can see locally running agents (Future)
- Locally running agents connect to the evaluator agent powered by Nemotron mini via the A2A protocol
- Nemotron mini model locally hosted using ollama
- The evaluator agent leverages MCP to call the sentiment analysis tool to evaluate agent responses
- The evaluator agent also has tools to write responses back to the dashboard to be reviewed by the developer
Challenges we ran into
A2A and MCP were particularly tricky to pin down. After spending some time with both protocols, we definitely understand them more; however, there are some parts that are definitely still a bit murky.
In addition to that, I think the biggest issue we had was with our model performance. The mini model runs a bit slow, and we believe it's due to the volume and sequential nature of the prompts and responses we're sending to the test agent. A plan of ours is to leverage concurrent prompting as well as multi-agent architectures to improve performance in the future
Accomplishments that we're proud of
Even though it's slow, getting a function MCP, A2A setup to run locally was a huge accomplishment. There were times when trying to figure out how all these pieces were going to come together was difficult, but in the end, we're very happy with what we built and what we learned.
We're also proud of the name; it's pretty radical!
What we learned
- Building it is half the battle; we spent lots of time trying to optimize for performance. In the future, that's something we want to think about earlier. Especially if working with local models.
- A2A and MCP aren't so scary!
- Learned more about Nvidia's model offerings and some of their capabilities
What's next for Realtime Agent Analysis Dashboard (R.A.A.D)
- We would like to open-source this project at some point in the future because we believe other developers would benefit
- Make setup and the overall dev experience smoother
- Add more MCP tools and experiment with mult-agent architectures
- Encrich the logging and look into a cloud offering
Built With
- a2a
- fastapi
- langchain
- mcp
- nvidia-nemotron
- ollama

Log in or sign up for Devpost to join the conversation.