FBGEMM - AshokBhat/ml GitHub Wiki
About
- Low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference.
- FB (Facebook) + GEMM (General Matrix-Matrix Multiplication)
Features
- Efficient low-precision general matrix multiplication for small batch sizes
- Accuracy-loss minimizing techniques such as row-wise quantization and outlier-aware quantization.
- Exploits fusion opportunities
- Generates high-performance shape- and size-specific kernels at runtime
Performance
- 2x performance gains vs FB production baseline.
Deployment
Integration
- Backend of Caffe2 and PyTorch quantized operators for x86 machines
Resources
Architecture support
- x86 - Optimized
- AArch64 - No support
See also