Wall Street Bench

frontend
ui with copilotkit
mastra playground
mastra agents
leaderboard
high level architecture

Inspiration

I'm a big fan of LMArena and their benchmarks for Design, Chat, Coding, etc. Unfortunately, they don't have one for losing money as fast as possible, so I thought I'd make one for AI investing workflows.

What it does

It's a frontend UI for people to compare models at investment analysis tasks. Models from Anthropic and OpenAI face off in epic battles, do a bunch of tool calls to yahoo finance/tavily/the SEC, and the user chooses which one is better.

How we built it

The backend is a mastra server running on an EC2 medium instance with 3 MCP servers (yahoo finance, tavily, SEC EDGAR). The frontend is a NextJS app with MastraClient and CopilotKit components. Logs and votes are saved to an RDS postgres instance.

Challenges we ran into

Connecting the frontend to the mastra server (AI SDK from vercel didn't work, so used copilotkit instead)
Setting up networking and deploying to ec2 -CopilotChat observability, passing the messages to parent components was tricky -It's not possible to switch models for agents from frontend, had to make a bunch of separate mastra agents -Serializing the message and tool call objects to postgres jsonb was hard

Accomplishments that we're proud of

-It works!

What we learned

-Integration is hard -deployment is harder -MCPs for claude code docs are not perfect

What's next for Wall Street Bench

-Add more models through openrouter -Launch publicly -Make it more stable -Make the leaderboard real -Calculate stats -Clean up the code

Built With

claude
copilotkit
mastra
nextjs
python
tavily

Updates

Michael Yu started this project — Oct 12, 2025 04:14 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.