Introduction

We hope to replicate and potentially extend the findings of Chen et. al. in the paper, Decision Transformer: Reinforcement Learning via Sequence Modeling[1] from UC Berkeley. This paper details the use of a multi-layer transformer architecture in traditional reinforcement learning tasks. This paper introduced the notion of leveraging sequential learning techniques, and more specifically, the GPT architecture for offline problem learning. We chose this paper because of the seemingly growing relevance and deployment of GPTs for a range of extremely consequential purposes. The possibility of bringing the benefits of transformers to another domain of machine learning is exciting!

Related Work

In the paper, Transformers are Meta-Reinforcement Learners, Melo shows that the transformer architecture is actually very well suited for reinforcement learning, due to its capacity to mimic episodic memory over a very large context window via self-attention, and goes on to address one of the biggest problems with transformer models: their instability in the training and sensitivity to learning rate and weight initialization. Melo proposes and tests (on a locomotion/robotic manipulation task) a specific transformer reinforcement learning agent as well as a weight initialization and optimization scheme which vastly improves the stability of models, in turn eliminating the need for dynamic, complex learning rates. While the architecture implemented here is similar to the architecture in our primary paper, several hyper-parameters and input shapes differ. Melo also explicitly tests for and unpacks their transformer agent’s performance on meta-RL benchmarks, which our primary paper also did not. Our primary paper from Chen et. al., on the other hand, demonstrated that transformer models could perform comparably or much better than temporal difference and action-copying models on Atari games, AIGym tasks, and key-to-door problems. Both papers, however, detail the use of a multi-layer transformer architecture in traditional reinforcement learning tasks. These papers expanded on the notion of leveraging sequential learning techniques, and more specifically, the GPT architecture for offline problem learning. Comment

Primary Paper: https://arxiv.org/abs/2106.01345

Secondary Paper: https://proceedings.mlr.press/v162/melo22a.html

https://huggingface.co/docs/transformers/en/index (one of the libraries used in our primary paper)

https://github.com/karpathy/minGPT (another library used in our primary paper)

Data Sources

Our data (as well as the data used in our primary reference) are coming from the Atari and AI Gym environments. These are environments that allow for the simulation of common RL tasks. Our benchmarks will be the same as those used in our primary paper.

https://arxiv.org/abs/2004.07219

https://github.com/Farama-Foundation/D4RL

Metrics

The benchmarks from our reference paper will serve as our primary metric of success. These are effectively measuring how well our agent performs on a number of different tasks(Atari games and tasks from OpenAI) . We plan to experiment with hyper-parameter-sweeps, random seed/initialization sensitivity, and potentially performance in novel environments/expanded data sets from AI gym. Our reference paper showed agent performance that matched or exceeded more traditional RNN based RL models at these tasks - we hope to expand on these performance gains. Comment

Base goal: duplicate, quantitative ablation, qualitative (derive from quantitative results) Target goal: extend to new environments Stretch goal: apply Decision Transformer to a warehouse task and show online decision making

Ethics

This project centers around creating an agent to create optimal decisions at each step, while leveraging everything it has seen (that is, everything in it’s arbitrarily large context window) to do so. Applied to Atari games, this is not of great consequence, but it is easy to imagine an agent with a similar architecture, drawing on the same fundamental methods asthis project, making decisions about matters of great importance on people’s lives. This is problematic because transformers are known to be prone to both hallucination and bias, which could result in unfair or harmful decisions being made when better alternatives are available Comment

There is no denying, however, that Transformers are in many ways a fantastic alternative to other RL techniques - they can be more efficiently trained, more effective during deployment, and have a much richer contextual understanding of their current environment. Deep learning is not just an ideal solution for developing an agent to play complex and unpredictable games, it is in some cases, the only option that is not cost-prohibitive.

Division of Labor

Trey

Preliminary review and update of reference implementation, contributed to outline for and edited devpost and proposal.

Quinn

Wrote devpost and proposal, reference implementation analysis, literature review, implementation analysis

Built With

Share this project: