Model architecture - QueensGambit/CrazyAra GitHub Wiki

Model architecture

RISEv3.3

In ClassicAra 0.9.2 the RISEv3.3 architecture was introduced which is an improvement over the RISEv2 architecture (#104).

The development process was influenced by the following papers. However, most of the proposals turned out to be not beneficial for chess neural networks or suboptimal when applied for GPU inference.

The changes which where incorporated in RISEv3.3 where the following:

The architecture resulted in an ~150 Elo improvement when trained on the same data set, here Kingbase2019lite. The other only difference other difference was changing the value loss ratio from 0.01 to 0.1.

TimeControl "7+0.1"
Score of ClassicAra 0.9.1 - Risev3.3 vs ClassicAra 0.9.1 - Risev2: 81 - 15 - 64 [0.706]
Elo difference: 152.4 +/- 42.8, LOS: 100.0 %, DrawRatio: 40.0 %

160 of 1000 games finished.

RISEv2

In CrazyAra v0.2.0 a newly designed architecture was used which is called RISE for short.

It incorporates new ideas and techniques described in recent papers for Deep Learning in Computer Vision.

ResneXt He et al. - 2015 - Deep Residual Learning for Image Recognition.pdf - https://arxiv.org/pdf/1512.03385.pdf
Xie et al. - 2016 - Aggregated Residual Transformations for Deep Neurarl Networks - http://arxiv.org/abs/1611.05431
Inception Szegedy et al. - 2015 - Rethinking the Inception Architecture for ComputerVision - https://arxiv.org/pdf/1512.00567.pdf)
Szegedy et al. - 2016 - Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning - https://arxiv.org/pdf/1602.07261.pdf)
Squeeze Hu et al. - 2017 - Squeeze-and-Excitation Networks - https://arxiv.org/pdf/1709.01507.pdf)
Excitation Hu et al. - 2017 - Squeeze-and-Excitation Networks - https://arxiv.org/pdf/1709.01507.pdf)

The proposed model architecture has fewer parameters, faster inference and training time while maintaining an equal amount of depth compared to the architecture proposed by DeepMind (19 residual layers with 256 filters). On our 10,000 games benchmark dataset it achieved a lower validation error using the same learnig rate and optimizer settings.

RISE-Architecture (CrazyAra v0.2) Vanilla-Resnet Architecture(CrazyAra v0.1)
lr-schedule lr-schedule