Project - pedroprates/navigation-project GitHub Wiki

This is the first project from the Deep Reinforcement Learning Nanodegree from Udacity. The project consists on training a Deep Q-Learning network, using state-of-the-art techniques such as Experience Replay, Double Q-Learning and others.

The DQN algorithm will run through an Unity Environment and will be trained to maximize future expected reward. So it's going to learn to explore and collect bananas, from a square-world, with a simple reward system: +1 if a yellow banana is picked and -1 if it picks a blue one. Therefore the DQN algorithm will learn to pick yellow bananas while avoiding the blue ones.

Environment

Unity Environment

The first step to replicate this project is to download the Unity Environment that the Deep Q-Learning Network must be trained on. You can select the specific download for your OS:

Training

The training is thoroughly explained on the notebook dqn_navigation.ipynb, but it is also possible to train using the train_dqn.py file, where you can tweak the hyperparameters by setting the following:

  • env: The path of the Unity Environment file.
  • model: The folder where to save the model. It is not necessary to explicit inform the name of the model file, this will be generated automatically based on time info. It's default to the model folder.
  • episodes: The amount of episodes that the model should be training on.
  • steps: The maximum number of steps on a single episode the training should perform.
  • eps_start: The start epsilon value for the epsilon-greedy technique.
  • eps_decay: The factor the epsilon value will be multiplied with after each episode
  • eps_min: The minimum value of epsilon for the epsilon-greedy technique, to prevent the agent to stop learning with very little values of epsilons (when the training already performed several episodes)

The Agent can be trained both on CPU or GPU without any addition to the code, it already tries to use any GPU (conda compatible) on PyTorch.

Running

It is possible to check the performance of your algorithm by running the run_dqn.py file using the following arguments:

  • env: The path of the Unity Environment file.
  • model: The entire path of the saved model (including the model name).

There is a trained model saved on the model folder, which performs a mean score of 15 on 100 episodes, 2 more than what Udacity asked for considering this task solved. And it has been trained for only 2,000 episodes using a really simple multi-layer perceptron.