Inspiration
MCP agents are everywhere in hackathons, but we don't see them in production much yet. A problem is making sure AI responses are reliable, along metrics like bias, completeness, and prompt relevancy. So we built a realistic agentic workflow and monitored it with another agent
What it does
It's an agent that compares two stocks by using yahoo finance and calculator tools and then emails you an analyst report. Another agent then reviews the outputs and scores it on bias, completeness, and prompt relevance
How we built it
Brightdata for webscraping, google for models, and mastra and mastra evals for the agent stuff.
Challenges we ran into
Mastra is super flexible and there are a lot of ways to integrate with nextjs (standalone or monolith).
Accomplishments that we're proud of
It works and the evals look sick (thanks claude)
What we learned
Multi agent monitor systems are probably necessary for scalable production deployments of agents
What's next for STEVE
More metrics and stuff
Log in or sign up for Devpost to join the conversation.