quantization - AshokBhat/ml GitHub Wiki
- Reduce the precision of the numbers used to represent a model's parameters - FP32 by default
- Pros - Smaller model size and faster computation.
- Cons - Not trivial
- Post-training FP16 quantization
- Post-training dynamic range quantization
- Post-training integer quantization
- Quantization-aware training
- What is quantization?
- What are the downsides?
- When is it used?
- What support do various frameworks have for quantization?
⚠️ **GitHub.com Fallback** ⚠️