Inspiration

MCP agents are everywhere in hackathons, but we don't see them in production much yet. A problem is making sure AI responses are reliable, along metrics like bias, completeness, and prompt relevancy. So we built a realistic agentic workflow and monitored it with another agent

What it does

It's an agent that compares two stocks by using yahoo finance and calculator tools and then emails you an analyst report. Another agent then reviews the outputs and scores it on bias, completeness, and prompt relevance

How we built it

Brightdata for webscraping, google for models, and mastra and mastra evals for the agent stuff.

Challenges we ran into

Mastra is super flexible and there are a lot of ways to integrate with nextjs (standalone or monolith).

Accomplishments that we're proud of

It works and the evals look sick (thanks claude)

What we learned

Multi agent monitor systems are probably necessary for scalable production deployments of agents

What's next for STEVE

More metrics and stuff

Built With

Share this project:

Updates