Make LLM fight each other in real time!

Evaluate LLMs in real time with Street Fighter III

Make LLM fight each other in real time in Street Fighter III.

Which LLM will be the best fighter ?

Our criterias 🔥

They need to be:

Fast: It is a real time game, fast decisions are key
Smart: A good fighter thinks 50 moves ahead
Out of the box thinking: Outsmart your opponent with unexpected moves
Adaptable: Learn from your mistakes and adapt your strategy
Resilient: Keep your RPS high for an entire game

Let the fight begin 🥷

Explanation

Each player is controlled by an LLM. We send to the LLM a text description of the screen. The LLM decide on the next moves its character will make. The next moves depends on its previous moves, the moves of its opponents, its power and health bars.

Agent based
Multithreading
Real time

A new kind of benchmark ?

Street Fighter III assesses the ability of LLMs to understand their environment and take actions based on a specific context. As opposed to RL models, which blindly take actions based on the reward function, LLMs are fully aware of the context and act accordingly.

Results

ELO ranking

Model	Rating
🥇openai:gpt-3.5-turbo-0125	1776.11
🥈mistral:mistral-small-latest	1586.16
🥉openai:gpt-4-1106-preview	1584.78
openai:gpt-4	1517.2
openai:gpt-4-turbo-preview	1509.28
openai:gpt-4-0125-preview	1438.92
mistral:mistral-medium-latest	1356.19
mistral:mistral-large-latest	1231.36