BFloat16 - AshokBhat/notes GitHub Wiki

Bfloat16

16-bit floating-point format
Format: 1 bit sign, 8 bit exponent, 7 bit mantissa
Better suited for deep learning than FP16

Support

By Intel, Arm, NVIDIA and Google TPU

See Also

FP16 | FP32 | bfloat16 | INT8

⚠️ GitHub.com Fallback ⚠️