Structure - mateuslevisf/xrl-pucrio GitHub Wiki

Each section of this page refers to a folder in the project; secion titles link to the corresponding section in the GitHub Code page.

Images

Non-test classes:

Test classes:

See the packages image for the package diagram.

Outer files

This section pertains to files that are outside of the project folders.

.gitignore, LICENSE, README.md are all self-explanatory.
conda_env.yml is the Conda/Miniconda environment file, needed for project installing.
hvalues_arg.json and vipers_args.json serve as template input files that can be used to quickly run and configure the project as an alternative to using the CLI.
xrlpucrio.py is the main file of the project.

agents

An Agent, as defined in the agent.py file, is a class that holds base functions for an agent in a Reinforcement Learning context. The agent can output an action to an input observation and then be updated with the resulting reward and next observed state. All agents should inherit from the base Agent class and since each function can have very different behaviours depending on the policy the agent uses, most functions by default raise an NotImplementedError.

The q_agent.py and dqn_agent.py hold, respectively, an agent that uses Q-Learning and an agent based on a DQN (Deep Q-Learning Network). Both function very similarly. Note that the QAgent class is developed to be able to generate H-Values if one is using that technique, but the DQNAgent is not.

The decision_tree.py agent uses a Decision Tree for action selection. It was designed for use in the VIPER technique, though one can try using it as its own agent by adapting it a little (though this is not implemented by default in the project).

environments

An Environment, as defined in the env_instance.py file, is a class that holds base functions for an environment (more specifically a Gymnasium one) in a Reinforcement Learning context. The most important class function is the loop, which holds the code for a basic Reinforcement Learning training loop (with evaluation rounds every evaluation_interval episodes). Another important function is evaluate, which holds a loop similar to the loop function but without training, serving to quickly generate a summary of the performance of an agent. The generate_plots function generates a simple line plot showcasing the performance of the agent in the evaluation periods of training.

The blackjack.py and cartpole.py files hold classes that inherit from the base Environment class described above but expand the functions to allow usage of the Blackjack and Cart Pole Gymnasium environments. In particular, the Cartpole class wraps the base Environment class in order to discretize the observation space and allow the use of the DQN and Q-Learning RL methods.

networks

The networks folder currently holds the single dqn.py file which holds a simple implementation of a Deep Q-Learning Network, which currently is only used in the DQNAgent class.

results

The results folder is by default empty: it will hold all artifacts generated by project execution, such as plots and model files. Is VERY IMPORTANT that users remember that all results are erased on the START of program execution. So if the user wants to keep results generated by another execution, it is necessary that they remember to move all wanted artefacts to another folder, for example the saved folder.

saved

The saved folder is just a simple place for users to save model results manually; the project never interacts with this folder programatically. Currently it only holds the optimal results for QLearning applied to the Blackjack environment.

tests

The tests folder holds unit tests for the project. Each subfolder corresponds to tests for the files in the project folder of the same name. The files subfolder holds JSON files used for tests of the argument parsing functions. The files directly inside the tests folder hold tests for the main execution loop of the program which can take a bit to run (around 10 minutes).

utils

The utils folder holds general utility functions separated by file.

arguments_parser.py - holds all functions related to argument parsing, including functions that add missing parameters if needed (either by user not adding all parameters in input file or using only some of the available CLI parameters) and parameter defaults.
log.py - holds general logging functions.
memory.py - holds the ReplayMemory class and the Transition enum, which are used for batch-saving in the DQNAgent class training loop.
plots.py - holds plotting functions.
viper.py - holds all functions (except Decision Tree implementation and related functions) used by the VIPER technique.
wrappers.py - holds the DiscretizedObservatioNWrapper used to discretize the Cart Pole (and other, to-be implemented continuous observation space) environment.