Inspiration
I'm a big fan of LMArena and their benchmarks for Design, Chat, Coding, etc. Unfortunately, they don't have one for losing money as fast as possible, so I thought I'd make one for AI investing workflows.
What it does
It's a frontend UI for people to compare models at investment analysis tasks. Models from Anthropic and OpenAI face off in epic battles, do a bunch of tool calls to yahoo finance/tavily/the SEC, and the user chooses which one is better.
How we built it
The backend is a mastra server running on an EC2 medium instance with 3 MCP servers (yahoo finance, tavily, SEC EDGAR). The frontend is a NextJS app with MastraClient and CopilotKit components. Logs and votes are saved to an RDS postgres instance.
Challenges we ran into
- Connecting the frontend to the mastra server (AI SDK from vercel didn't work, so used copilotkit instead)
- Setting up networking and deploying to ec2 -CopilotChat observability, passing the messages to parent components was tricky -It's not possible to switch models for agents from frontend, had to make a bunch of separate mastra agents -Serializing the message and tool call objects to postgres jsonb was hard
Accomplishments that we're proud of
-It works!
What we learned
-Integration is hard -deployment is harder -MCPs for claude code docs are not perfect
What's next for Wall Street Bench
-Add more models through openrouter -Launch publicly -Make it more stable -Make the leaderboard real -Calculate stats -Clean up the code
Built With
- claude
- copilotkit
- mastra
- nextjs
- python
- tavily
Log in or sign up for Devpost to join the conversation.