bfloat16 inference - AshokBhat/ml GitHub Wiki About bfloat16's impact on inference. Different formats for inference and training Case #1 : Same precision Best practice - Use the same precision for training and inference. Case #2 : Mismatched precision Possible to train using fp32 for activations, and then run an inference with bfloat16 (or vice versa). Verify converged accuracy using the precision that is used for inference. See also bfloat16 bfloat16 training