- Title: Graph Convolutional Reinforcement Learning
- Who: Luke Primis (lprimis1) and Avi Trost (atrost)
- Introduction: What problem are you trying to solve and why?
- If you are implementing an existing paper, describe the paper’s objectives and why you chose this paper.
- The paper we are reimplementing deals with graph convolutional reinforcement learning in the context of multi-agent cooperation. The architecture tries to capture the dynamic relations between agents by applying multi-headed attention convolutions on the graph of agents, before passing in those feature vectors into a Q-network for reinforcement learning. This allows the agents to effectively cooperate with one another. We picked this paper because this has applications in real-world scenarios, including avoiding crashes in autonomous driving.
- This is a reinforcement learning problem
- If you are implementing an existing paper, describe the paper’s objectives and why you chose this paper.
- Related Work: Are you aware of any, or is there any prior work that you drew on to do your project?
- https://arxiv.org/abs/1908.03963
- This paper reviews existing MARL (multiple agent reinforcement learning) solutions. It highlights the differences between the single agent solutions and the current solutions. Also, it goes over the pros and cons of the existing solutions. Namely, it highlights how the central-controller solution (multiple single-agents that notify a central controller of their actions and the controller coordinates the agents) may seem like an appealing solution, but it actually has a number of scalability issues.
- Data: What data are you using (if any)?
- No data, just trains by playing the games
- Methodology: What is the architecture of your model?
- How are you training the model?
- We will train the model by having it play games involving many agents.
- If you are implementing an existing paper, detail what you think will be the hardest part about implementing the model here.
- The hardest part will be implementing new games to test the model, and interfacing it to the model to work. For games with multiple types of agents, we would need to learn different policies for each.
- If you are doing something new, justify your design. Also note some backup ideas you may have to experiment with if you run into issues.
- How are you training the model?
- Metrics: What constitutes “success?”
- For experiments, we plan to run a baseline model on their games that is not designed or optimized for cooperation, a model that is the same as the one in the paper, and models that are architecturally the same/similar to those in the paper but with tweaked hyperparameters or possibly added layers
- The best metric here is score/reward rather than accuracy since the model will be playing mini-games
- The original paper's authors were attempting to develop a RL model that could cooperate with other agents to achieve better results
- What are your base, target, and stretch goals?
- Base: get a baseline model and reimplemented version of their model running and playing the games they outlined in their paper
- Target: design new mini-games that challenge the model's ability to cooperate in different ways
- Stretch: elaborate on the model and try to optimize it for different games by adjusting hyperparameters, adding or tweaking layers, etc.
- Ethics: Choose 2 of the following bullet points to discuss; not all questions will be relevant to all projects so try to pick questions where there’s interesting engagement with your project. (Remember that there’s not necessarily an ethical/unethical binary; rather, we want to encourage you to think critically about your problem setup.)
- Why is Deep Learning a good approach to this problem?
- So many different scenarios and states, would be difficult to encode all the relevant information with a cookie cutter algorithm given that a given agent may want to change its decision based off the actions of other agents
- Reinforcement learning is useful here
- Who are the major “stakeholders” in this problem, and what are the consequences of mistakes made by your algorithm?
- In the case of self-driving cars as agents in the problem of autonomous driving
- Traffic lights
- Why is Deep Learning a good approach to this problem?
- Check in #2 Update
- Note: this is written in retrospect due to Dean's note
- At this point in the project, we found a new useful API called (PettingZoo) that gave us access to a game environment similar to OpenAI's Gym that contains game environments that deal with multiple agents for the purpose of reinforcement learning. After rewriting the model(DGN) in TensorFlow, this will allow us to easily interact with the game environment and train our model.
- The hardest challenge we've encountered thus far is figuring out exactly how the PettingZoo API works, as the documentation they have for games doesn't give specific information about the return types of data, so we need to figure that out by printing out results.
- At this point, we have not come up with any data, as the model is still in progress. We are on track to finish the project in time, but we may not have enough time to try training the model multiple times.
- Final Writeup
Built With
- python
- tensorflow
Log in or sign up for Devpost to join the conversation.