Model architecture - QueensGambit/CrazyAra GitHub Wiki

Model architecture

RISEv3.3

In ClassicAra 0.9.2 the RISEv3.3 architecture was introduced which is an improvement over the RISEv2 architecture (#104).

The development process was influenced by the following papers. However, most of the proposals turned out to be not beneficial for chess neural networks or suboptimal when applied for GPU inference.

MixConv: Mixed Depthwise Convolutional Kernels, Mingxing Tan, Quoc V. Le, https://arxiv.org/abs/1907.09595
Direct Neural Architecture Search on Target Task and Hardware, Han Cai, Ligeng Zhu, Song Han. https://arxiv.org/abs/1812.
MnasNet: Platform-Aware Neural Architecture Search for Mobile, Mingxing Tan, Bo Chen, Ruoming Pang, Vijay Vasudevan, Mark Sandler, Andrew Howard, Quoc V. Le http://openaccess.thecvf.com/content_CVPR_2019/html/Tan_MnasNet_Platform-Aware_Neural_Architecture_Search_for_Mobile_CVPR_2019_paper.html
FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search, Bichen Wu, Xiaoliang Dai, Peizhao Zhang, Yanghan Wang, Fei Sun, Yiming Wu, Yuandong Tian, Peter Vajda, Yangqing Jia, Kurt Keutzer, http://openaccess.thecvf.com/content_CVPR_2019/html/Wu_FBNet_Hardware-Aware_Efficient_ConvNet_Design_via_Differentiable_Neural_Architecture_Search_CVPR_2019_paper.html
MobileNetV3: Searching for MobileNetV3, Andrew Howard, Mark Sandler, Grace Chu, Liang-Chieh Chen, Bo Chen, Mingxing Tan, Weijun Wang, Yukun Zhu, Ruoming Pang, Vijay Vasudevan, Quoc V. Le, Hartwig Adam. https://arxiv.org/abs/1905.02244
Convolutional Block Attention Module (CBAM), Sanghyun Woo, Jongchan Park, Joon-Young Lee, In So Kweon https://arxiv.org/pdf/1807.06521.pdf
ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks (ecaSE) - Wang et al. https://arxiv.org/abs/1910.03151
Rethinking Bottleneck Structure for EfficientMobile Network Design, D. Zhou and Q. Hou et al., https://link.springer.com/chapter/10.1007/978-3-030-58580-8_40

The changes which where incorporated in RISEv3.3 where the following:

Replacing squeeze excitation modules by efficient squeeze excitation modules as proposed by Wang et al. (https://arxiv.org/abs/1910.03151)
Replacing sigmoid by hard-sigmoid as recommend in Searching for MobileNetV3 by Howard et al. (https://arxiv.org/abs/1905.02244)
Making use of 5x5 convolutions in deeper layers as recommended in Platform-Aware Neural Architecture Search for Mobile by Tan et al. (http://openaccess.thecvf.com/content_CVPR_2019/html/Tan_MnasNet_Platform-Aware_Neural_Architecture_Search_for_Mobile_CVPR_2019_paper.html)
Using the flag boolean flag global for average pooling layers
Using a higher initial channel size, more residual blocks but a lower increase of number of channels per layer (32 instead of 64).

The architecture resulted in an ~150 Elo improvement when trained on the same data set, here Kingbase2019lite. The other only difference other difference was changing the value loss ratio from 0.01 to 0.1.

TimeControl "7+0.1"
Score of ClassicAra 0.9.1 - Risev3.3 vs ClassicAra 0.9.1 - Risev2: 81 - 15 - 64 [0.706]
Elo difference: 152.4 +/- 42.8, LOS: 100.0 %, DrawRatio: 40.0 %

160 of 1000 games finished.

RISEv2

In CrazyAra v0.2.0 a newly designed architecture was used which is called RISE for short.

It incorporates new ideas and techniques described in recent papers for Deep Learning in Computer Vision.


ResneXt	He et al. - 2015 - Deep Residual Learning for Image Recognition.pdf - https://arxiv.org/pdf/1512.03385.pdf
	Xie et al. - 2016 - Aggregated Residual Transformations for Deep Neurarl Networks - http://arxiv.org/abs/1611.05431
Inception	Szegedy et al. - 2015 - Rethinking the Inception Architecture for ComputerVision - https://arxiv.org/pdf/1512.00567.pdf)
	Szegedy et al. - 2016 - Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning - https://arxiv.org/pdf/1602.07261.pdf)
Squeeze	Hu et al. - 2017 - Squeeze-and-Excitation Networks - https://arxiv.org/pdf/1709.01507.pdf)
Excitation	Hu et al. - 2017 - Squeeze-and-Excitation Networks - https://arxiv.org/pdf/1709.01507.pdf)

The proposed model architecture has fewer parameters, faster inference and training time while maintaining an equal amount of depth compared to the architecture proposed by DeepMind (19 residual layers with 256 filters). On our 10,000 games benchmark dataset it achieved a lower validation error using the same learnig rate and optimizer settings.

RISE-Architecture (CrazyAra v0.2)	Vanilla-Resnet Architecture(CrazyAra v0.1)