Supervised training - QueensGambit/CrazyAra GitHub Wiki

Supervised Training

  • CrazyAra 0.1 used a similar training schedule as Alpha-Zero:

Using a constant learning rate 0.1 and dropping by factor 10 when no improvement was made on the validation dataset for a given period.

  • CrazyAra 0.2 uses a One-Cycle-Policy Learning rate schedule combined with a momentum schedule. The learning rate was determined using a lr-range test and the model was trained for seven epochs with a mini batch size of 1024.
lr-schedule momentum-schedule

Referenes

Training Data

The deeper model using 7 standard residual blocks and 12 bottleneck residual blocks was trained only supervised using the same training and validation dataset:

  • 569,537 human games generated by lichess.org users from January 2016 to June 2018 (database.lichess.org/) in which both players had an elo >= 2000

train-data-stats

Training Results

lr-schedule lr-schedule
lr-schedule lr-schedule

As it can bee seen in the graphs the deeper model converged quicker. Despite using half of the batch-size and having a deeper model the full training time was reduced from previously ~40 hours to ~36,5 hours.

Result overview regarding metrics

Current overall best network trained on the all over 2000 elo game dataset:

Metric CrazyAra 0.1 CrazyAra 0.2
val_policy_loss 1.2680 1.2647
val_value_loss 0.7817 0.7386
val_policy_acc 0.5930 0.5895
val_value_acc_sign 0.6818 0.7010
mate_in_one_policy_policy_loss 0.5859 0.5514
mate_in_one_value_loss 0.0769 0.0534
mate_in_one_acc 0.939 0.939
mate_in_one_top_5_acc 0.997 0.998