Instruction-following Game System

Introduction

In this project, we built a LLM-based video game – an instruction-following system for text-based fantasy adventures. By LoRA fine-tuning the state-of-the-art open source instruction following LLMs (Llama3) with diverse data we distilled using OpenAI’s API from GPT-4, we built a system that allows users to freely adventure in the fantasy worlds constructed by LLMs. The game system presents the player with an initial scenario and description, along with the final goal needed to achieve victory. During the game, the player can input actions via text or follow system-provided hints when they get stuck. Based on the current scenario, each action leads to a result and a player status update. The player must achieve the goal within a designated number of rounds.

Who

Cynthia Xing (yxing12), Jessie Zhang (jzhan628), Winnie Zhang (wzhan184)

Data

Data distillation We first prompted GPT-4 to generate 10 random fantasy adventure topics as well as their corresponding backgrounds. We then distilled approximately 800 actions and their corresponding outcomes (the result of the action, the status of the game afterward, and the new scenario the action leads to) given all previous history. Starting with 10 seed action-outcome pairs, we ran and distilled the game for up to 10 rounds.

Methodology

We will be training 3 models and use LoRA finetuning on two of them.

Action-Prompter (LoRA Llama3-Instruct 8B) -- When players cannot think of a good action to take, they may call the action prompter to generate a feasible action to enter into the game controller. (i.e. HINT function in most games)
Status Evaluator (RoBerta) -- An encoder-only model trained to classify the change of player status given the result of the action: a. Win (the player has achieved the goal set in the background) b. Dead (the player died in the adventure) c. Wounded (the player become wounded) d. Healed (the player is healed from the wound) e. Nothing happened
Game Controller (LoRA Llama3-Instruct 13B) -- A game controller that takes in game history + the player’s new action {title, background, turn1 action, turn1 results, turn1 status, turn, turn1 new scenario, turn2 action…} and returns the result of the action + new game scenario. Then Status Evaluator will be called to evaluate the result of the action.

Metrics

To evaluate the performance, we will be assessing training and validation losses, and the Rouge2 F1 score for validation set.

Base goals

Successfully preprocess the data as intended.
Train at least two different LLMs (with up to 13B parameters).
Evaluate the performance and have our game running smoothly

Target goals

Make sure the LoRA fine-tuned model accurately interprets and responds to player's entirely new actions and make reasonable responses.
Tackle Long-context Inference issue

Stretch goals

Addressing the challenge of constraining the game within a framework and preventing users from circumventing security measures through creative problem-solving is another consideration.
Implement a RAG-based LLM for the Game Controller model.

Ethics

Our dataset may not accurately reflect real-world usage in games -- GPT4 tends to generate game results that lead towards a happy ending. Moreover, the instruction tuning datasets derived in part from GPT-4 raise privacy concerns regarding the use of such generated data.

Division of labor

Cynthia will be responsible for preprocessing and sampling the data using the method described above. We will then each be responsible of training one of the models. At the end, all three of us will evaluate the performance of the models together.