Introductory Tutorial - PathmindAI/nativerl GitHub Wiki
Introductory Model Overview
To get started, first download the tutorial files here.
The introductory model features a state chart in a simple stochastic, or random, environment. Within the chart is a final goal, as well as Start and Intermediate states.

Each second, the agent must choose between two decisions: move or do nothing. It can freely cycle between Start and Intermediate, but can only reach the goal if it remains in the Intermediate state for between two and five seconds. The randomness of the model comes from a timeout that is a random number between two and five. The goal is to train the agent to wait and do nothing in order to reach the final goal.

To explore the simulation, open the model in AnyLogic. In the Pathmind Helper properties, uncheck the Enabled checkbox.

Now run the simulation. Use the Move button to transition the agent from the Start to Intermediate state. Notice that immediately clicking Move again will send the agent back to Start. Now try transitioning to Intermediate and waiting between two and five seconds before clicking Move. Since the timeout is reached, the agent can now move to the goal.

Deciding how long to wait before taking an action may seem like a simple problem in such a small model, but it quickly demonstrates how this works. Models and optimization can quickly become more complicated as models take on additional parts and complexities.
Pathmind Helper Properties
Open the PathmindHelper properties and re-check the Enabled checkbox. Observe the reinforcement learning elements.

Number of Agents - This indicates the number of "controlled" agents (i.e. decision points) in your model. In this tutorial, there is only one decision point.

Observations - Observations serve as the eyes and ears of a simulation and include any information about the current state of the environment. In this model, observations are a one-hot array expressing the current location of the agent.
[Start, Intermediate, goal] [1.0, 0.0, 0.0] - Agent in "Start" state. [0.0, 1.0, 0.0] - Agent in "Intermediate" state. [0.0, 0.0, 1.0] - Agent in "goal" state.

Metrics - Metrics are the building blocks of the reward function and are used to determine if an action was good or bad. Often, embody important KPIs such as revenue and cost. They are combined within the reward function to teach the algorithm which actions are best as it seeks to optimize, usually for several metrics at once. Each action results in points, or a reward, being given.
This model grants a reward of 1 when the agent reaches the goal.

Actions - Actions define what agents are allowed to perform. In this case, there are two discrete choices: do nothing (0) or move (1).

This action (0 or 1) is passed as an argument to doAction(action), which is then executed by the AnyLogic model.

Done - Every simulation needs an endpoint. Some simulations conclude after a defined period of time, while others reach their end when certain conditions become true. In this case, the simulation ends when the goal is reached.

Event Trigger - The event trigger tells Pathmind when to trigger the next action. Some models use time-based event triggers, while others rely on conditional triggers. This model performs one action each second.

Testing The Model
It is good practice to perform a test run before exporting a model. Doing so will ensure that the model is functional and the PathmindHelper elements are working correctly.
In the Pathmind Helper properties, select the Debug Mode checkbox. Now run the simulation.

Once the simulation is running, open the Developer Panel. If set up correctly, data will be printed for each action an agent performs.

Exporting Model For RL Training
Please see https://github.com/PathmindAI/nativerl/wiki/Export-AnyLogic-Simulation-For-Training.
Writing the Reward Function and Training
Please see the following articles:
- Run Commands - https://github.com/PathmindAI/nativerl/wiki/Run-Commands
- Setting Reward Function - https://github.com/PathmindAI/nativerl/wiki/Setting-Reward-Function
Validate Policy
Back in AnyLogic, select the policy file that you had just exported from Pathmind and run the included Monte Carlo experiment.

With the policy in place, the agent moves to the goal in as few steps as possible.

Compared to using random actions, or even human trial and error, the policy reaches the goal in as few moves as possible. In a real-world application, that improved performance could equal increased revenue or more efficient processes.