Deep Reinforcement Learning

Machine learning is a subset of AI, focusing on learning from data to improve the accuracy of an application. A machine learning algorithm is made up of a dataset and an algorithm. The dataset is split into a training and testing set; where the training set is used to train the algorithm to find trends within the data. Next, the algorithm is exposed to the testing set, where its accuracy can be assessed. Machine learning algorithms can be either supervised or unsupervised. Supervised learning is only possible when the dataset is labeled, else unsupervised learning takes place. One form of unsupervised learning (used in deep learning) is neural networks.

Deep learning is a form of machine learning where computers can learn and understand from earlier experiences. This form of unsupervised learning allows the computer to learn complex concepts by building upon simpler models. These models combined create a graph with multiple layers, known as a deep neural network. A neural network is an algorithm that is made up of a layered network, featuring an input layer, hidden layers, and an output layer. The neural network uses the input data to perform calculations. These calculations form conclusions, which are converted to probabilities. A deep neural network takes this information and continues to refine the results of earlier layers.

Reinforcement learning (RL) is a form of machine learning, where the algorithm is exposed to the raw dataset. The model learns through trial and error. When outcomes and always correct, it is reinforced into the artificial neural network. Reinforcement learning is reward-based, supplying the best solution as an output. A reinforcement learning environment is made up of agents, environments, states, actions, and rewards.

However, one difficulty of reinforcement learning is that the process can repeat overtime for hundreds of steps. Also, the reward can be very sparse. Typically, a reward is received at the end of the game, so the algorithm may take more time to find the best solution.

Environments transform the current action to the next action state and a reward, agents are functions that transform the new state and reward to the next action. The key features of reinforcement learning are that the agent learns good behavior alongside trial and error. As the agent has no control over the environment, information is gathered through interaction. Reinforcement learning shows the agents try to create the environment's function to gain the biggest possible reward. At each time step, the agent must take an action. The result of this action will cause the agent to receive a reward, a state transition, or an observation.

Markov Decision Process (MDP) is commonly used to describe reinforcement learning algorithms. It is a discrete-time stochastic control process, used to fully describe an environment within a reinforcement learning algorithm. In an MDP, the current state analyses its environment and takes any relevant information from the history. Then, once the state is known, the history is discarded, and the next action can take place. For each decision made, there is a reward (positive or negative), where a large positive reward is desired.

Built With

Share this project: