meeting 2024 03 07 - EESSI/meetings GitHub Wiki
Notes for 2024-03-07 meeting
- date & time: Thu 7 Mar 2024 - 14:00 CET (13:00 UTC)
- (every first Thursday of the month)
- venue: (online, see mail for meeting link, or ask in Slack)
- agenda:
- Quick introduction by new people
- EESSI-related meetings and events in last month
- Progress update per EESSI layer
- Update on EESSI production repository software.eessi.io
- Update on EESSI test suite + build-and-deploy bot
- EESSI support portal
- AWS/Azure sponsorship update
- Update on MultiXscale EuroHPC project
- Upcoming/recent events: EuroHPC Summit + EasyBuild User Meeting 2024 + ISC’24
- Q&A
Slides
Meeting notes
(by Bob)
Quick introduction by new people
EESSI-related meetings in last month
(see slides)
- Quite a lot of work went into (preparing) the MultiXscale project review meeting
- Kenneth had a very positive meeting with the MeluXina team
- EESSI will be part of the Vega poster at the EuroHPC summit
- Also trying to prepare a demo with Jupyter Notebooks, but initializing the EESSI environment in a notebook in Vega Open OnDemand setup turns out to be a bit tricky.
Progress update per EESSI layer
Filesystem layer
(see slides)
-
Should we make a separate repo for every new CPU target (e.g. zen4)?
- Everything still needs to be rebuilt in the end, in order to install it to the right path. For zen4 don't expect that many issues, so a separate repository makes less sense.
-
How do we properly do the testing when packages first get ingested into the dev repo? It means you still need to rebuild the package later on for the production repo installation path, and that means you're not testing the real/final version of the package.
- Some mapping / bind mounting needs to be done, so we probably need a container to do it. Maybe bwrap can also help here.
Compatibility layer
(see slides)
- We should check why
libxml2is in our compatibility layer, i.e. which package depends on it. If it can be removed, we should remove it. - As we're now using the pilot repo for testing certain packages/fixes, does it mean that the new dev repositoy would need to include its own compatibility layer as well (i.e. it needs to be a full copy)?
- Perhaps, though we could probably use the variant symlink approach that we were planning to use for the compat layer anyway. Then the test version of the compat layer could either go into the prodcution repo or into the dev repo, and the variant symlink should allow you to pick up this version from whatever location.
Software layer
(see slides)
-
The Lmod hook to work around the problem with the OpenMPI
smcudacomponent is only relevant for the Neoverse V1 CPU target. -
CUDA load hook is not actually working right now, Lmod supports only one hook for every type; we now have two load hooks in place, and only the last one is actually used. Caspar has a short-term fix and also made a Lmod PR to allow for multiple hooks.
-
We need to start running some Lmod user experience tests to make sure we don't break anything when for example updating to a newer Lmod version in the compat layer.
-
Rebuilding software is not possible at the moment, as the installation directories are set read-only by EasyBuild, and it's hard to undo this in build the container (due to specific of
fuse-overlayfs). The latest approach with using--fakerootto first remove the existing installation directory seems to work. We still need a solution for the ingestion as well, as the old installation needs to be wiped from the repository before/while the new one gets ingested. Maybe the bot can play a role here. -
We're now sort of abusing the
$LMOD_RCfile for our hooks. We should do the hook registration inSitePackage.lua. -
The
--from-commitfeature being implemented in EasyBuild should reduce the number of GitHub API calls (we're now often hitting a rate limit) and makes things reproducible (--from-pris not reproducible since it takes things from thedevelopbranch for merged PRs).
RISC-V
- Prebuilt CVMFS client packages are not available for RISC-V (yet), so we need to build from source for now.
- Bootstrapping Gentoo Prefix for
riscv64now works! - Also installing additional packages for EESSI compat layer works (when testing manually).
- The location of
bits/libc-header-start.hdiffers per Linux distro. Debian apparenly already had a fix but didn't contribute that upstream to GCC, but now it's finally included in GCC itself as well.
Build-and-deploy bot
(see slides)
software.eessi.io repository
(see slides)
EESSI documentation
(see slides)
- https://status.eessi-infra.org is / will be removed and is replaced by https://status.eessi.io/pilot, this still needs to be adapted in the documentation.
EESSI test suite
(see slides)
- Planning to make a dashboard for the test results.
- For SURF something similar is already being done, and it's using ReFrame's output report file.
- Davide will be working on Quantum Espresso tests.
- Documentation about writing portable tests is not really available yet, but the GROMACS test serves as a nice example to get started with writing tests.
Support for EESSI
(see slides)
AWS/Azure sponsored credits
(see slides)
- A new Slurm cluster has been spun up on Azure today, and we will start using it for doing Zen 4 builds.
MultiXscale EU project
(see slides)
Events
(see slides)
- EuroHPC Summit: https://www.eurohpcsummit.eu
- EasyBuild User Meeting: https://easybuild.io/eum24
- ISC'24: https://isc-hpc.com
- 3 tutorials proposals related to EESSI got rejected
- EESSI Community BoF accepted!
- Working on paper submission for RISC-V workshop
- We will probably also do some kind of social event for EESSI community.
Q&A
-
Åke recreated all EESSI module files for a hierarchical module naming scheme. This worked really well, only one failure for CFITSIO due to a sanity check writing trying to write to the installation directory (see easyconfigs issue #19970)
-
Jure is also interested in using EESSI in Jupyter Notebooks; will there be a write-up of the work done for Vega?
- Vega uses Open OnDemand, and their Jupyter Notebook app does not really allow you to customize the environment.
- Normally, making EESSI available in a notebook should be straightforward as long as the environment in which the notebook is started can be controlled.
-
Next meeting: Thu 4 April 2024 at 14:00 CEST (12:00 UTC)