Inspiration

I've been working with GPT-4 and startled by it's resilience wrt logic and reasoning challenges. I wanted to experiment and try something to push this -- didn't really know if it would work!

What it does

"LLM game engine" is a bit extra but in a sense that's what is happening. I set up a conversational pattern in the GPT-4 32k chat interface that supports the logic and state tracking required to run a simple turn-based strategy game.

How we built it

I started defining the concepts of a turn-based strategy with the model, and posed the challenge of running a game like that; essentially, pitching the idea to the model and getting its feedback and sense of confidence. We then went back and forth, building the logic up in layers, starting with the concept of a 2d map that could contain units and structures, the properties of which needed to be tracked separately, which was the secondary layer to come in. From there we hashed out a ruleset, etc. When we had a representational structure and reasoning chain in the prompt that felt like could adequately take the player's input and synthesize that into the resulting changes in board position and asset stats, and did that also for the AI side of the game, we started playtesting. The game evolved considerably from there for reasons explored shortly, but before that, shout-out and many thanks to Maja Todorovic for representing the gameplay visually for the video, without which that would be very dry.

Challenges we ran into

-We started trying to do something in the vein of Dwarf Fortress, but it proved way too complex for me to feel comfortable that we could effectively debug it in the time available. So I scaled back the complexity.

-The model started, I would say, 70% effective and accurate in tracking and representing the game state. I did some work on the prompt, reordering some things to make it more straightforward, and improved that to 90%, which is where it stayed. I have some ideas on how to get it higher but didn't have time to implement.

-Once I got the game running, the model played terribly. It was still impressive that it failed realistically, but it wasn't that fun! So I gave it some coaching, very generally, and we played again. The difference was dramatic. Although I still felt I had the edge, I didn't have time to finish the game. I think with more work the agent behaviour could be improved further.

Accomplishments that we're proud of

It was fun to take a risk on something that could very easily have failed, and even better to have it work out better than I had expected.

What we learned

It was amazing to get insight into the model up close. It's going to be really interesting to discover the approaches required to take what is such a staggering performance and adapt it into something that is legitimately production ready. The use case is obviously an essential ingredient, and this experience definitely gives me a better sense of the models capabilities; limited as they may be by my prompting ability. I look forward to learning how to better my skills.

Also, from a process perspective, I discovered something quite incredible though perhaps less obvious: the model's ability to track a complex user exchange. In this case, I could address the model both within the framework of the game (as a player, with my move selection) and as a co-creator (commenting on how things were going, identifying bugs, requesting changes), and do those things within the same flow. For instance, one input from me might be a move selection, and the next might be a query as to why the AI made the move that it did, or pointing out an error. It parsed all this flawlessly, and never got confused. This ability of the AI to both be "inside" and "outside" the process under development is very exciting wrt future workflows, where we will be able to (as happened here) test and improve something as it is being built in a single and integrated process.

What's next for LLM Game Engine

I'm not sure! It's obviously not really practical in itself itself, but I am super interested to apply some of these techniques to doing agent design for interactive 3d games. I feel like the approaches I tested here will be great starting places in harnessing the model's reasoning capabilities to make interesting and realistic choices for interactive characters.

Built With

  • gpt
Share this project:

Updates