Validate Trained Policy - PathmindAI/nativerl GitHub Wiki

To validate the performance of a trained Pathmind policy, you will need to run a Monte Carlo experiment in AnyLogic. A Monte Carlo experiment automatically executes hundreds of simulation runs, using random initial seeds, which can be used to validate the results of the policy.

Step 1

Determine which metrics to track. In your AnyLogic model, note which metrics you'd like to measure. This can range from specific variables, to AnyLogic histogram data, or anything else.

Step 2

Create a new Monte Carlo experiment.

montecarlo

Step 3

Configure your Monte Carlo. Select a name for the experiment.

montecarlo

Set the Number of Iterations (i.e. the number of simulation runs) to 100 which is the minimum number we recommend.

montecarlo

Define the metrics you want to track.

montecarlo

  • Title - This is the graph label, and it can be anything.
  • Expression - The metric you would like to track. Typically, this is a variable in your AnyLogic simulation.
  • Number of Intervals - Range of possible values in your metric.
  • Initial Interval Size - The size of each bar in your bar chart.

Step 4

Make sure the Monte Carlo model time and randomness matches what you have set in Simulation. If Model Time and Randomness do not match what you have set in your Simulation experiment, the Monte Carlo results will be invalid!

montecarlo

Step 5

Run your Monte Carlo. Change the Pathmind Helper "Mode" to "Use Policy" and point it to the policy zip file that you had exported from Pathmind.

montecarlo

Run your Monte Carlo. This can take several hours, depending on the length and complexity of your simulation.

montecarlo

Comparing Results

At the conclusion of your Monte Carlo, you should see a distribution. The next step is to compare these results with a baseline. Typically, comparable baselines include:

  • Random Actions
  • FIFO
  • Heuristics
  • Optimizers such as OptQuest

Example Comparison

Monte Carlo Using Pathmind Policy

As you can see, using the trained Pathmind policy, the number of balked customers is about 75 on average. Lower is better in this case.

montecarlo

Monte Carlo Using Random Actions

In comparison, the average number of balked customers is about 225 using random actions.

montecarlo

This is far worse than the trained policy, meaning that the trained policy drastically outperformed our baseline of random actions.