quantization - AshokBhat/ml GitHub Wiki

Description

  • Reduce the precision of the numbers used to represent a model's parameters - FP32 by default

Pros and Cons

  • Pros - Smaller model size and faster computation.
  • Cons - Not trivial

Types

  • Post-training FP16 quantization
  • Post-training dynamic range quantization
  • Post-training integer quantization
  • Quantization-aware training

FAQ

  • What is quantization?
  • What are the downsides?
  • When is it used?
  • What support do various frameworks have for quantization?

See also

⚠️ **GitHub.com Fallback** ⚠️