Evaluate LLMs in real time with Street Fighter III
Make LLM fight each other in real time in Street Fighter III.
Which LLM will be the best fighter ?
Our criterias 🔥
They need to be:
- Fast: It is a real time game, fast decisions are key
- Smart: A good fighter thinks 50 moves ahead
- Out of the box thinking: Outsmart your opponent with unexpected moves
- Adaptable: Learn from your mistakes and adapt your strategy
- Resilient: Keep your RPS high for an entire game
Let the fight begin 🥷
Explanation
Each player is controlled by an LLM. We send to the LLM a text description of the screen. The LLM decide on the next moves its character will make. The next moves depends on its previous moves, the moves of its opponents, its power and health bars.
- Agent based
- Multithreading
- Real time
A new kind of benchmark ?
Street Fighter III assesses the ability of LLMs to understand their environment and take actions based on a specific context. As opposed to RL models, which blindly take actions based on the reward function, LLMs are fully aware of the context and act accordingly.
Results
ELO ranking
| Model | Rating |
|---|---|
| 🥇openai:gpt-3.5-turbo-0125 | 1776.11 |
| 🥈mistral:mistral-small-latest | 1586.16 |
| 🥉openai:gpt-4-1106-preview | 1584.78 |
| openai:gpt-4 | 1517.2 |
| openai:gpt-4-turbo-preview | 1509.28 |
| openai:gpt-4-0125-preview | 1438.92 |
| mistral:mistral-medium-latest | 1356.19 |
| mistral:mistral-large-latest | 1231.36 |
Go to Github to discover the results!
Needless to say, they are pretty unexpected.
Log in or sign up for Devpost to join the conversation.