float - Serbipunk/notes GitHub Wiki

structure

bits

  1. sign, 2. exponent, 3. fraction

https://en.wikipedia.org/wiki/Bfloat16_floating-point_format

Screenshot 2024-09-16 at 3 36 01 PM

precision

Data-Type | Precision

float16 | 3 to 4 float32 | 6 to 9 float64 | 15 to 17 float128 | 18 to 34

https://stackoverflow.com/q/56514892/22550824