Project Idea: Short Description (100-500 words)
We are creating a deep reinforcement agent to play one of our favorite games, Unrailed 2, a local multiplayer resource management game. The goal of the game is to reach a station by gathering resources and building railroad tracks ahead of a continuously moving train to prevent it from derailing in a scrolling grid world.
To rule out some technical feasibility concerns, we plan on implementing a custom gymnasium-compatible simulation of the game rather than interacting with the actual game. This would allow us to focus on the DL elements, such as state representation and network architecture, rather than using computer vision, which is outside of the scope of this class.
We plan on using Proximal Policy Optimization as our learning algorithm because it is stable in discrete control environments. To process the game state, we will be implementing a hybrid model. There will be two main streams, one for the visual stream and another for the state stream. The Convolutional Neural Network will be used to extract spatial features from the N x M game grid. This will capture things such as the terrain, obstacles, and train. For the state stream, we will design an MLP to process the agents' attributes, such as held items and timers. These two streams will be fused to help the agent make a decision.
We plan on using a dense reward function so then the agent is not purely focused on the end goal. These are some sample reward points we plan on implementing: 100 points when the train reaches the final station. 10 points when the train travels over a newly placed track Some smaller rewards would be: 0.1 points for chopping down trees or mining rocks 0.1 points for picking up resources 0.2 points for crafting railroad tracks 0.5 points for picking up railroad tracks 1 point for connecting a railroad track In addition to the rewards, we need to penalize the agent to discourage it from doing unwanted actions: Negative 10 points for the train crashing And we might introduce a living penalty to encourage the AI to complete a map faster.
Key Limitations
A large technical challenge we anticipate is our model overfitting to certain maps. Deep learning agents tend to find optimal paths and trajectories rather than learning generalized paths and trajectories. The agent might be able to perform perfectly on one map, but fail horribly on randomly generated maps.
Data ideas
This is a RL based project, so there are no datasets we are using. We will be using the Gymnasium API to act as our data creator.
Log in or sign up for Devpost to join the conversation.