Introductory Tutorial - PathmindAI/nativerl GitHub Wiki

Introductory Model Overview

To get started, first download the tutorial files here.

The introductory model features a state chart in a simple stochastic, or random, environment. Within the chart is a final goal, as well as Start and Intermediate states.

ssm_0

Each second, the agent must choose between two decisions: move or do nothing. It can freely cycle between Start and Intermediate, but can only reach the goal if it remains in the Intermediate state for between two and five seconds. The randomness of the model comes from a timeout that is a random number between two and five. The goal is to train the agent to wait and do nothing in order to reach the final goal.

ssm_1

To explore the simulation, open the model in AnyLogic. In the Pathmind Helper properties, uncheck the Enabled checkbox.

ssm_2

Now run the simulation. Use the Move button to transition the agent from the Start to Intermediate state. Notice that immediately clicking Move again will send the agent back to Start. Now try transitioning to Intermediate and waiting between two and five seconds before clicking Move. Since the timeout is reached, the agent can now move to the goal.

ssm_3

Deciding how long to wait before taking an action may seem like a simple problem in such a small model, but it quickly demonstrates how this works. Models and optimization can quickly become more complicated as models take on additional parts and complexities.

Pathmind Helper Properties

Open the PathmindHelper properties and re-check the Enabled checkbox. Observe the reinforcement learning elements.

ssm_4

Number of Agents - This indicates the number of "controlled" agents (i.e. decision points) in your model. In this tutorial, there is only one decision point.

ssm_5

Observations - Observations serve as the eyes and ears of a simulation and include any information about the current state of the environment. In this model, observations are a one-hot array expressing the current location of the agent.

[Start, Intermediate, goal] [1.0, 0.0, 0.0] - Agent in "Start" state. [0.0, 1.0, 0.0] - Agent in "Intermediate" state. [0.0, 0.0, 1.0] - Agent in "goal" state.

ssm_6

Metrics - Metrics are the building blocks of the reward function and are used to determine if an action was good or bad. Often, embody important KPIs such as revenue and cost. They are combined within the reward function to teach the algorithm which actions are best as it seeks to optimize, usually for several metrics at once. Each action results in points, or a reward, being given.

This model grants a reward of 1 when the agent reaches the goal.

ssm_7

Actions - Actions define what agents are allowed to perform. In this case, there are two discrete choices: do nothing (0) or move (1).

ssm_8

This action (0 or 1) is passed as an argument to doAction(action), which is then executed by the AnyLogic model.

ssm_9

Done - Every simulation needs an endpoint. Some simulations conclude after a defined period of time, while others reach their end when certain conditions become true. In this case, the simulation ends when the goal is reached.

ssm_10

Event Trigger - The event trigger tells Pathmind when to trigger the next action. Some models use time-based event triggers, while others rely on conditional triggers. This model performs one action each second.

ssm_11

Testing The Model

It is good practice to perform a test run before exporting a model. Doing so will ensure that the model is functional and the PathmindHelper elements are working correctly.

In the Pathmind Helper properties, select the Debug Mode checkbox. Now run the simulation.

ssm_12

Once the simulation is running, open the Developer Panel. If set up correctly, data will be printed for each action an agent performs.

ssm_13

Exporting Model For RL Training

Please see https://github.com/PathmindAI/nativerl/wiki/Export-AnyLogic-Simulation-For-Training.

Writing the Reward Function and Training

Please see the following articles:

Run Commands - https://github.com/PathmindAI/nativerl/wiki/Run-Commands
Setting Reward Function - https://github.com/PathmindAI/nativerl/wiki/Setting-Reward-Function

Validate Policy

Back in AnyLogic, select the policy file that you had just exported from Pathmind and run the included Monte Carlo experiment.

ssm_14

With the policy in place, the agent moves to the goal in as few steps as possible.

ssm_15

Compared to using random actions, or even human trial and error, the policy reaches the goal in as few moves as possible. In a real-world application, that improved performance could equal increased revenue or more efficient processes.