Inspiration
Devin and SWE-agent almost treat LLMs as reinforcement learning agents (to an extent). It is quite interesting to see how SWE-agent makes an observation, a decision, and an evaluation of its actions (state, action, reward?). Although I would have loved to make something similar to SWE-agent with Gemini, that might be out of scope for a short hackathon, but I thought it'd be interesting to see if one could do Twitch Plays Pokemon - but with Gemini, and see if it starts prompting itself to navigate the game. PyBoy can be used to benchmark RL agents and maybe it could also be used to benchmark LLMs?
What it does
A python script collects messages from my Twitch channel pyboy_gemini and then gives these messages, along with information about the Pokemon game's state (from its RAM using PyBoy) to Gemini via the API. I have tried to prompt Gemini to constrain its replies to a set of moves and a reply message. Asking it to repeatedly press Start or A results in Gemini "pressing" Start or A to start the game (which is not surprising but it's better than some of the initial outputs where it rebelled).
How we built it
A link to our GitHub has been provided which shows how we used twitchio, pyboy and Google's generative ai Python packages to make this bot. Instructions have been provided on how to get started with a Twitch Access token and Google API key (see the README).
Challenges we ran into
Unfortunately the Twitch API went down right before filming how the bot worked so there isn't much of a demo.. however, the code is available on GitHub and works (provided the Twitch API isn't down).
Accomplishments that we're proud of
It kind of works and we are thinking about keeping the Twitch stream going indefinitely, although this could become costly!
What we learned
Refining LLM output to match exact criteria has its difficulties.
What's next for Gemini Plays Pokemon
Using images and facilitating more complex reasoning to get Gemini to win Pokemon on its own. Also playing other games that can run on PyBoy.

Log in or sign up for Devpost to join the conversation.