Action Masking - PathmindAI/nativerl GitHub Wiki

In many situations, an agent is allowed to select actions that are impossible at particular moments in time. For example, in the case of a manufacturing line, a reinforcement learning policy can direct a machine to begin processing the next product even though the machine is currently occupied. Over time, a policy should learn to avoid these "invalid" actions but this makes learning confusing and inefficient.

To avoid this issue, Pathmind allows you to apply an action mask which tells the policy whether or not a selected action is allowed or disallowed at any given moment in time. By ignoring actions that are "invalid", the policy can better mask out noisy and useless information. Take a look at this article for additional information about the motivation behind action masks.

**** Keep in mind that action masking only supports discrete action spaces ****

Step 1

Open the Pathmind Helper properties and locate the Action Masks field.

action_masking

**** {true, false} is a static placeholder for demonstration purposes. You must replace this with a function that constructs the mask each time Pathmind Helper is triggered. This is explained in Step 2. ****

Step 2

Construct your action mask.

Whenever Pathmind is triggered, you must return a boolean array (boolean[]) in which each element in the array corresponds to the action in question.

True means do not mask because the action is valid.
False means mask the action because the action is invalid.

For example, you can pass the boolean array by calling a function each time Pathmind is triggered.

action_masking

Step 3

Audit your action mask.

Turn on debug mode and inspect the console output to confirm that the action mask is working as intended.

action_masking

In the example above, the policy is allowed two actions: 0 and 1. Within the action mask array, index 0 corresponds to action 0 and index 1 correspond to action 1.

Step 4

Query the trained policy. Once you have obtained a policy from Pathmind, simply run it back in AnyLogic to inspect the results.

Masking Tuple Actions

You can apply the mask to tuple actions by simply appending the mask for each action. Below is an example for clarity.

action_masking

This can be configured like the below in Pathmind Helper.

action_masking