FBGEMM - AshokBhat/ml GitHub Wiki

About

Low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference.
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication)

Features

Efficient low-precision general matrix multiplication for small batch sizes
Accuracy-loss minimizing techniques such as row-wise quantization and outlier-aware quantization.
Exploits fusion opportunities
Generates high-performance shape- and size-specific kernels at runtime

Performance

2x performance gains vs FB production baseline.

Deployment

Deployed at Meta

Integration

Backend of Caffe2 and PyTorch quantized operators for x86 machines
- Caffe2: https://github.com/pytorch/pytorch/tree/master/caffe2/quantization/server
- PyTorch: https://github.com/pytorch/pytorch/tree/master/aten/src/ATen/native/quantized/cpu

Resources

GitHub - https://github.com/pytorch/FBGEMM
Launch blog - https://engineering.fb.com/ml-applications/fbgemm/

Architecture support

x86 - Optimized
AArch64 - No support

See also