FBGEMM - AshokBhat/ml GitHub Wiki

About

  • Low-precision, high-performance matrix-matrix multiplications and convolution library for server-side inference.
  • FB (Facebook) + GEMM (General Matrix-Matrix Multiplication)

Features

  • Efficient low-precision general matrix multiplication for small batch sizes
  • Accuracy-loss minimizing techniques such as row-wise quantization and outlier-aware quantization.
  • Exploits fusion opportunities
  • Generates high-performance shape- and size-specific kernels at runtime

Performance

  • 2x performance gains vs FB production baseline.

Deployment

  • Deployed at Meta

Integration

Resources

Architecture support

  • x86 - Optimized
  • AArch64 - No support

See also