2020 09 09 Threshold offset plots - WojciechMigda/TruRL GitHub Wiki
Threshold offset plot for max_episode_steps=500
Run parameters:
Episodes: 1000
max_episode_steps: 500
KBinsDiscretizer({
{34, -4.800000, 4.800000},
{34, -2.600000, 2.600000},
{34, -0.418000, 0.418000},
{34, -3.000000, 3.000000},})
TsetliniClassifierBitwise({
"threshold": 32000,
"s": 4.000000,
"number_of_regressor_clauses": 3200,
"number_of_states": 127,
"boost_true_positive_feedback": 1,
"random_state": 1,
"n_jobs": 6,
"clause_output_tile_size": 16,
"weighted": true,
"max_weight": 2147483647,
"verbose": false
})
Four Threshold offset values were examined: 0, 400, 800, and 1600.

Conclusions:
- with number of episode steps increased 2.5x (500 vs. 200), but Threshold value kept at 32000, the model learn very slowly,
- using Threshold offsets boosts initial learning speed, but then the models lose in performance and even fall behind non-offset model,
Plotting scripts
steps_200_Toff.py
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import plac
import numpy as np
import pandas as pd
def main():
df = pd.read_csv('steps_500_Toff.csv', header=None, names=['t0', 't400', 't800', 't1600'])
import matplotlib.pyplot as plt
plt.figure()
lw = 2
plt.plot(df.index + 1, df['t0'], lw=lw, color='orange', alpha=0.7, label='t=0')
plt.plot(df.index + 1, df['t400'], lw=lw, color='red', alpha=0.7, label='t=400')
plt.plot(df.index + 1, df['t800'], lw=lw, color='purple', alpha=0.7, label='t=800')
plt.plot(df.index + 1, df['t1600'], lw=lw, color='navy', alpha=0.7, label='t=1600')
plt.xlabel("Episode")
plt.ylabel("Avg. reward")
plt.xlim(1, 1000)
plt.ylim(0, 500)
plt.title("Averaged 10x runs, 500 steps\n[3200 clauses, T=32000, s=4.0]")
plt.legend(loc='upper left')
plt.show()
return 0
if __name__ == '__main__':
plac.call(main)
steps_200_Toff_AUC.py
#!/usr/bin/python3
# -*- coding: utf-8 -*-
import plac
import numpy as np
import pandas as pd
def main():
df = pd.read_csv('steps_500_Toff.csv', header=None, names=['t0', 't400', 't800', 't1600'])
df = df.cumsum(axis=0)
import matplotlib.pyplot as plt
plt.figure()
lw = 2
plt.plot(df.index + 1, df['t0'], lw=lw, color='orange', alpha=0.7, label='t=0')
plt.plot(df.index + 1, df['t400'], lw=lw, color='red', alpha=0.7, label='t=400')
plt.plot(df.index + 1, df['t800'], lw=lw, color='purple', alpha=0.7, label='t=800')
plt.plot(df.index + 1, df['t1600'], lw=lw, color='navy', alpha=0.7, label='t=1600')
plt.xlabel("Episode")
plt.ylabel("Avg. cumulative reward")
plt.xlim(1, 1000)
plt.ylim(0, 180000)
plt.title("Averaged 10x runs, 500 steps\n[3200 clauses, T=32000, s=4.0]")
plt.legend(loc='lower right')
plt.show()
return 0
if __name__ == '__main__':
plac.call(main)
Data
Location: /experiments/2020-09-09_step500_T32k_Toff
Logs were created by running /experiments/run.sh script (invocation parameters hardcoded inside). Logs were transformed into CSV file with averaged runs by executing /experiments/run_csv.py:
../run_csv.py steps_500_Toff.csv run_test2.log run_test3.log run_test4.log run_test5.log
Commit
cbcdc24fadb4be4bb39c73907205d0d57fda49a3