oneDNN knobs - AshokBhat/ml GitHub Wiki
OneDNN performance improvement knobs
KMP_AFFINITY
- Applicable only when Intel libomp runtime is used
- Recommended settings with Hyperthreading -
export KMP_AFFINITY=granularity=fine,compact,1,0
Fine
causes each OpenMP thread to be bound to a single thread context.Verbose
prints messages in runtime concerning the supported affinity, and this is optional.Compact
is value of type, assigning the OpenMP thread +1 to a free thread context as close as possible to the thread context where the OpenMP thread was placed.
- Recommended settings without Hyperthreading -
export KMP_AFFINITY=granularity=fine,compact
- Illustration of usage: https://cvw.cac.cornell.edu/Hybrid/kmpaffinity
KMP_BLOCKTIME
- Applicable only when Intel libomp runtime is used
- Recommended settings for CNN:
export KMP_BLOCKTIME=0
- Recommended settings for non-CNN:
export KMP_BLOCKTIME=1
OMP_NUM_THREADS
- Recommended settings for CNN:
export OMP_NUM_THREADS=<num physical cores>
KMP_SETTINGS
- Applicable only when Intel libomp runtime is used
- Recommended settings for verbose :
export KMP_SETTINGS=TRUE
- Enables (TRUE) or disables (FALSE) the printing of OpenMP run-time library environment variables during program execution
Source: Intel's page
MKL_NUM_THREADS=N
- Libraries involved: MKL
- Enable MKL threading - use when you are sure that there are enough resources (physical cores) for MKL threading in addition to your own threads.
MKL_DYNAMIC
- Libraries involved: Intel's libomp and MKL
- Enables MKL to dynamically change the number of threads
- When MKL_DYNAMIC is FALSE, MKL uses the suggested number of OpenMP threads whenever the underlying algorithms permit