2020 09 07 Threshold offset plots - WojciechMigda/TruRL GitHub Wiki

Threshold offset plot for `max_episode_steps=200`

In case of RL initial response values that the regressor is trained with are close to 0. With high Threshold values the gradient is very small and the stochastic training employed by Tsetlin Regressor fails to pick up sufficient learning speed.

One of the solutions is to employ an offset to the discrete response values that are directly fed to the model and which are mapped from continuous response space. For instance, instead of mapping continous range of [0, 200] to discrete [0, T] one can map the target into [offset, T + offset]. With that the initial gradient will be enlarged and the initial learning rate will be boosted.

Run parameters:

Episodes: 1000
max_episode_steps: 200
KBinsDiscretizer({
    {34, -4.800000, 4.800000},
    {34, -2.600000, 2.600000},
    {34, -0.418000, 0.418000},
    {34, -3.000000, 3.000000},})
TsetliniClassifierBitwise({
        "threshold": <#####>,
        "s": 4.000000,
        "number_of_regressor_clauses": 3200,
        "number_of_states": 127,
        "boost_true_positive_feedback": 1,
        "random_state": 1,
        "n_jobs": 6,
        "clause_output_tile_size": 16,
        "weighted": true,
        "max_weight": 2147483647,
        "verbose": false
    })

Four Threshold offset values were examined: 0, 400, 800, and 1600.

Plot AUC Plot AUC Plot (zoomed)

Conclusions:

Threshold offset did not lead to a degraded learning performance. While in this setting it might not be of direct importance it might be a factor in future experiments with larger max_episode_steps values,
there does not seem to be obvious ordering of performance. The top two learners used Threshold offset of 0 and 1600.
using Threshold offset has an impact on initial learning speed. All non-zero values lead to the model attaining higher rewards earlier than the one with zero offset. The differences are small but they confirm initial assumption.

Plotting scripts

steps_200_Toff.py

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import plac
import numpy as np
import pandas as pd


def main():
    df = pd.read_csv('steps_200_Toff.csv', header=None, names=['t0', 't400', 't800', 't1600'])

    import matplotlib.pyplot as plt

    plt.figure()

    lw = 2

    plt.plot(df.index + 1, df['t0'], lw=lw, color='orange', alpha=0.7, label='t=0')
    plt.plot(df.index + 1, df['t400'], lw=lw, color='red', alpha=0.7, label='t=400')
    plt.plot(df.index + 1, df['t800'], lw=lw, color='purple', alpha=0.7, label='t=800')
    plt.plot(df.index + 1, df['t1600'], lw=lw, color='navy', alpha=0.7, label='t=1600')

    plt.xlabel("Episode")
    plt.ylabel("Avg. reward")

    plt.xlim(1, 1000)
    plt.ylim(0, 200)

    plt.title("Averaged 10x runs, 200 steps\n[3200 clauses, T=32000, s=4.0]")
    plt.legend(loc='lower right')

    plt.show()
    return 0


if __name__ == '__main__':
    plac.call(main)

steps_200_Toff_AUC.py

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import plac
import numpy as np
import pandas as pd


def main():
    df = pd.read_csv('steps_200_Toff.csv', header=None, names=['t0', 't400', 't800', 't1600'])

    df = df.cumsum(axis=0)

    import matplotlib.pyplot as plt

    plt.figure()

    lw = 2

    plt.plot(df.index + 1, df['t0'], lw=lw, color='orange', alpha=0.7, label='t=0')
    plt.plot(df.index + 1, df['t400'], lw=lw, color='red', alpha=0.7, label='t=400')
    plt.plot(df.index + 1, df['t800'], lw=lw, color='purple', alpha=0.7, label='t=800')
    plt.plot(df.index + 1, df['t1600'], lw=lw, color='navy', alpha=0.7, label='t=1600')

    plt.xlabel("Episode")
    plt.ylabel("Avg. cumulative reward")

    plt.xlim(1, 1000)
    plt.ylim(0, 180000)

    plt.title("Averaged 10x runs, 200 steps\n[3200 clauses, T=32000, s=4.0]")
    plt.legend(loc='lower right')

    plt.show()
    return 0


if __name__ == '__main__':
    plac.call(main)

Data

Location: /experiments/2020-09-07_step200_Toff

Logs were created by running /experiments/run.sh script (invocation parameters hardcoded inside). Logs were transformed into CSV file with averaged runs by executing /experiments/run_csv.py:

../run_csv.py steps_200_Toff.csv run_test4.log run_test5.log run_test6.log run_test7.log

Commit

2bef692f683c204cced4677091556164532710cb

2020 09 07 Threshold offset plots - WojciechMigda/TruRL GitHub Wiki

Threshold offset plot for max_episode_steps=200

Plotting scripts

Data

Commit

⚠️ **GitHub.com Fallback** ⚠️

Threshold offset plot for `max_episode_steps=200`

⚠️ GitHub.com Fallback ⚠️