[20220524] Compression Roadmap 2022 - microsoft/nni GitHub Wiki

Design Note

The method of simulating the compression effect is to replace some nodes in the graph with the wrapped ones. But note that sometimes the method of wrapping only the current node is not equivalent to the actual compressed effect.

base
- evaluator (handle train, validate, hook, patch...)
  - api design
- config list refactor
  1. specify compression target (input, output, weight, ...)
  2. specify compression algo (include related parameters, such as sparse pattern, quant bit)
- Support for variable compression targets
  1. compressor & wrapper refactor, provide a unified interface for parsing config list.
  2. basic pruner refactor & quantizer design
- Super compressor? most existed basic pruner/quantizer can implement by config super compressor?
  1. universal wrapper
pruning
- refactor sparse pattern
  - metric calculator & sparsity allocator
- migrate to evaluator
quantization
- refactor design (key consideration: experiment, evaluator, wrapper, conv-bn-fusion)
experiment
- wrap tuner as strategy
  - search space generator
- support more pruners & quantizers
- a good strategy (how to search in search space)
speedup
- mask propagation stands alone as a module
- quantization speedup supports more backend
benchmark
visualization