Math libraries on AMD Rome (20201130) - easybuilders/easybuild GitHub Wiki

  • talk by Sebastian Achilles (JSC): performance of math libraries on AMD Rome (Zen2)
  • significant performance benefits for BLIS compared to Intel MKL (2020.2) and OpenBLAS
    • BLIS: version 2.2 (AMD fork), but same performance as stock BLIS
      • AMD-specific kernels have been backported to upstream
    • full node tests (128 cores @ JUSUF system at JSC)
      • large performance gaps for several BLAS function (dgemm, zgemm, etc.)
    • single-threaded performance difference is smaller, but still in favor of BLIS
  • switch from OpenBLAS to BLIS in foss toolchain?
    • needs testing on Intel systems as well, compare BLIS with OpenBLAS
    • Sebastian will share his benchmarks scripts so others can test as well
  • also compared FFTW 3.3.8 vs patched FFTW 3.3.8 by AMD
    • both significantly faster than Intel MKL 2020.2 on AMD Rome
    • patched FFTW shows even better performance
  • should we use AMD-patched FFTW in foss toolchain?
    • AMD patches introduce --enable-amd configuration option, so may be safe to also apply on Intel systems
    • makes providing optimized FFTW for AMD easier in foss toolchains
    • if needed we can pick which FFTW installation to use on AMD systems at installation time
  • notes:
    • Intel MKL 2020.2 has some Zen2-specific kernels, but still falls back to Intel Pentium 4 code paths
    • some performance improvements in Intel MKL 2020.4, but BLIS is still significantly better
    • Intel MKL can be "convinced" to use AVX2-optimized code paths
      • easy to do in imkl 2020.0, just use export MKL_DEBUG_CPU_TYPE=5
      • harder in more recent imkl 2020 versions, requires patching of binaries/libraries (see https://danieldk.eu/Posts/2020-08-31-MKL-Zen.html)
        • and that's against the EULA, and you don't really know what you're getting (or if it works correctly)