oneDNN knobs - AshokBhat/ml GitHub Wiki

OneDNN performance improvement knobs

KMP_AFFINITY

  • Applicable only when Intel libomp runtime is used
  • Recommended settings with Hyperthreading - export KMP_AFFINITY=granularity=fine,compact,1,0
    • Fine causes each OpenMP thread to be bound to a single thread context.
    • Verbose prints messages in runtime concerning the supported affinity, and this is optional.
    • Compact is value of type, assigning the OpenMP thread +1 to a free thread context as close as possible to the thread context where the OpenMP thread was placed.
  • Recommended settings without Hyperthreading - export KMP_AFFINITY=granularity=fine,compact
  • Illustration of usage: https://cvw.cac.cornell.edu/Hybrid/kmpaffinity

KMP_BLOCKTIME

  • Applicable only when Intel libomp runtime is used
  • Recommended settings for CNN: export KMP_BLOCKTIME=0
  • Recommended settings for non-CNN: export KMP_BLOCKTIME=1

OMP_NUM_THREADS

  • Recommended settings for CNN: export OMP_NUM_THREADS=<num physical cores>

KMP_SETTINGS

  • Applicable only when Intel libomp runtime is used
  • Recommended settings for verbose : export KMP_SETTINGS=TRUE
  • Enables (TRUE) or disables (FALSE) the printing of OpenMP run-time library environment variables during program execution

Source: Intel's page

MKL_NUM_THREADS=N

  • Libraries involved: MKL
  • Enable MKL threading - use when you are sure that there are enough resources (physical cores) for MKL threading in addition to your own threads.

MKL_DYNAMIC

  • Libraries involved: Intel's libomp and MKL
  • Enables MKL to dynamically change the number of threads
  • When MKL_DYNAMIC is FALSE, MKL uses the suggested number of OpenMP threads whenever the underlying algorithms permit

See also