Conference call notes 20220202 - easybuilders/easybuild GitHub Wiki

(back to Conference calls)

Notes on the 190th EasyBuild conference call, Wednesday 2 Feb 2022 (09:00 UTC)

Attendees

Alphabetical list of attendees (12):

  • Kenneth Hoste (HPC-UGent, Belgium)
  • Kurt Lust (Univ. of Antwerpen - LUMI)
  • Åke Sandgren (Umeå University, Sweden)
  • Jurij Pečar (EMBL, Germany)
  • Jörg Saßmannshausen
  • Terje Kvernes (University of Oslo, Norway)
  • Jorge Guerra (Universidad Politécnica de Madrid, Spain
  • Sam Moors (Vrije Universiteit Brussel, Belgium)
  • Mikael Öhman (Chalmers University of Technology, Sweden)
  • Thomas Röblitz (Univ. of Bergen, Norway)
  • Adam Huffman (Big Data Institute, Oxford, UK)
  • Alex Domingo (Vrije Universiteit Brussel, Belgium)

Agenda

  • overview of recent developments
  • OpenMPI + CUDA, OpenMPI 5.x
  • Q&A

Recent developments

  • release timeline
    • latest release: EasyBuild v4.5.2 (24 Jan 2022)
    • ETA next release: end of Feb'22?
  • recent changes
    • framework
      • bug fixes
        • only run GitHub tests when testing with Lua module syntax, to avoid hitting GitHub rate limit when running tests (PR #3938)
        • fix get_os_name and get_os_version to avoid reporting UNKNOWN in output of eb --show-system-info (PR #3942)
        • take into account that patch files can also be zipped when checking filename extension for patches (PR #3936)
      • enhancements
        • ...
      • changes
        • ...
    • easyblocks
      • bug fixes
        • convert version numbers to stricly numerical in Siesta easyblock (PR #2553)
      • enhancements
        • update NAMD easyblock to allow non-system csh (PR #2654)
        • enhance CUDA easyblock to create version independent pkgconfig files (PR #2656)
        • also run easyblocks test suite with Python 3.8-3.10 (PR #2664)
      • changes
        • ...
      • new software
        • ...
    • easyconfigs
      • ~50 easyconfig PRs merged since last conf call!
      • bug fixes
        • add patch for hard-coded checksum value of downloaded source file in the source code of RDKit 2021.03.4 (PR #14743)
        • fix CVE-2021-23437 in Pillow (PR #14765) + Pillow-SIMD (PR #14792)
        • add libXfont2 patch to fix build when libbsd is present (PR #14821)
        • add missing UCX-CUDA dep to GROMACS for foss-2021a-CUDA-11.3.1 (PR #14859)
        • add alternative checksum for MASS, class, nnet, spatial extensions in R v4.1.0 + v4.1.2 easyconfigs (PR #14873 + PR #14880)
        • add patch to fix Kraken2 ncbi ftp/https check in rsync_from_ncbi.pl for versions 2.0.9-2.1.1 (PR #14889)
      • enhancements
        • ...
      • (noteworthy) new software
      • noteworthy software updates
        • ...
      • changes
        • trim test configurations for easyconfigs test suite: only test with Python 2.7 + 3.6 and Lmod 7.x + 8.x (PR #14857 + PR #14881)
    • framework
      • reported bugs / bug fixes
        • switch to using pip3 for installing EasyBuild in Singularity definition file generated by EasyBuild (PR #3945)
      • enhancements
        • tolerate pre-existing edges in depgraph (PR #2784)
        • extend framework to enable modules to ship RPATH wrappers (issue #3918)
        • allow setting extension-specific envars in module file (PR #3948)
        • add a "clone_into" field to git_config source specification (PR #3949)
        • add support for optional comment parameter for extensions (issue #3946)
        • drop into shell with full environment in case of failure (issue #3950)
      • changes
        • meaningful error output (and actual location of the log file on the line about log files) (issue #3915)
    • easyblocks
      • bug fixes
        • set CUDA target architecture(s) for GROMACS based on cuda_cc_semicolon_sep template value (PR #2655)
        • update Siesta EasyBlock to use serial FFTW (PR #2662)
        • enhance Geant4 easyblock: add support for optional build options (PR #2659)
      • enhancements
        • add support to NAMD easyblock to opt out of building with CUDA support even if CUDA is included as dependency (PR #2666)
      • updates
        • remove vulnerable binaries from sanity check of HDF5, no longer installed by default with HDF5 1.10.8 (PR #2670)
        • update sanity check in CUDA EasyBlock: CUDA 11.6 no longer includes samples (PR #2669)
      • new software
      • changes
    • easyconfigs

OpenMPI + CUDA, OpenMPI 5.x

  • foss + CUDA + UCX-CUDA apparently doesn't fully work as expected...
  • can we avoid having to go back to fosscuda?
  • see easyconfigs issue #14801
  • see upstream issue "Specific OSU benchmarks segfault when non-CUDA aware OpenMPI 4.1.1 compiled with CUDA-aware UCX" @ https://github.com/open-mpi/ompi/issues/9906
  • Mikael: OpenMPI's OPAL has some custom code when built on top of CUDA
    • custom opal_cuda_memcpy (via MEMCPY_CUDA macro) with additional check whether it's on a GPU
  • things to change:
    • enable OPAL CUDA support in OpenMPI (but without requiring CUDA)
      • Åke: requires change to --with-cuda to add support for enabling CUDA-awareness by using --with-cuda=yes
    • add OpenMPI-CUDA for CUDA mca bits (only install specific *.so` libraries)
  • OpenMPI 5.x
  • can we build a full OpenMPI with CUDA dep and use that to shadow OpenMPI without CUDA?
    • may cause trouble, even depending on order in which things are loaded...
  • foss + CUDA is a problem when using certain MPI collectives with CUDA buffers
    • which is a quite specific use case
    • we should try and fix this also for the foss/2021a/foss/2021b toolchains, not just going forward for foss/2022a
  • Åke: same CUDA-aware problems will pop up with UCC library (see https://ucfconsortium.org/projects/ucc/)
  • relevant: Alex' FOSDEM'22 talk: https://fosdem.org/2022/schedule/event/exascale_pmi
  • is there an OpenMPI community call we can jump into to explain our problems and how we plan to fix it

Q&A

  • ...