bfloat16 inference - AshokBhat/ml GitHub Wiki

About

bfloat16's impact on inference.

Different formats for inference and training

Case #1 : Same precision

Best practice - Use the same precision for training and inference.

Case #2 : Mismatched precision

Possible to train using fp32 for activations, and then run an inference with bfloat16 (or vice versa).
Verify converged accuracy using the precision that is used for inference.

See also