Simple masking and shifting operations to convert BF16 to FP32, and vice versa.
Potential for single format for training and inference
No need for scaling and quantization.
Avoid expensive retraining and redesign of network architecture.
Standard
No IEEE standard
Different architectures, accelerators and software libraries have adopted slightly different aspects of the IEEE 754 floating-point standard to govern the numeric behavior of arithmetic on BF16 values.