Sync meeting 2025 05 13 - multixscale/meetings GitHub Wiki
MultiXscale WP1+WP5 sync meetings
- Monthly, every 2nd Tuesday of the month at 09:30 CE(S)T
- Notes of previous meetings at https://github.com/multixscale/meetings/wiki
Next meetings
- Tue 10 June 2025 09:30 CEST
- Tue 1 July 2025 09:30 CEST
- rescheduled from 8 July
- Tue 12 August 2025 09:30 CEST
- Tue 9 September 2025 09:30 CEST
- 3 days before review
Agenda/notes 2025-05-13
attending:
- Petra (NIC)
- Richard, Thomas (UiB)
- Casper (SURF)
Caspar (SURF) - Lara (UGent)
- Helena, Susana, Arturo (Do IT Now)
- Pedro (RUG)
- Julián (BSC)
General
- Petra has started as Project Manager for MultiXscale
- overview of MultiXscale planning
- upcoming project review
- final date: Fri 12 Sept 2025 (confirmed)
- Agenda & list of participants is on OneDrive
- also some emails sent during last week
- Is it confirmed that WP1/WP5 leads can be remote
- Caspar is on holiday :D
- confirmed (?) by PM, focus will be on scientific WPs
- Agenda & list of participants is on OneDrive
- final date: Fri 12 Sept 2025 (confirmed)
- amendment update
- status?
- was signed by NIC
- waiting for response (25/45 days)
Deliverables due M30
- Upcoming deliverables (M30 - June 2025):
- Discussed D1.4, 1.5 and 5.3 on 09-05-2025
- see notes @ https://hackmd.io/WHVWZR3ITu20oJkLZZqd0w
- all deliverables in very good shape
- next: review + update
- goal: have them (camera-)ready by end of May for handover for final review to Steering Committee
- next meeting May 28
- D6.3 Interim report on Community outreach, Education, and Training => NIC (Petra)
- mostly written (by Neja)
- review by Alan ?
- Discussed D1.4, 1.5 and 5.3 on 09-05-2025
WP status updates
- WP status updates
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
- [UGent] T1.1 Stable (EESSI) - due M12+M24
- [RUG] T1.2 Extending support (starts M9, due M30)
- NVIDIA Grace: 100% - epsilon done
- Lara & Richard are working on LAMMPS
- little issue is that we don't have yet a service account
- Do we have a non-personal bot that can be used to keep installations for NVIDIA Grace in sync?
- Not yet, only personal bots for Thomas/Ricard
- Service account would be needed, should ask JSC? => Thomas will ask
- AMD ROCm: (Pedro, Bob)
- Status: close to having a ROCm-LLVM easyconfig
- builds were picking up something from the host, but this should be fixed now
- Next few weeks, try to build rest of the toolchain
- likely not before end of webinar series (June 2nd)
- (?) Alan will look into exposing AMD drivers, similar to what we do for NVIDIA drivers
- (?) Alan will look into EasyBuild toolchain (is it needed, what would be in there, can it share stuff with the regular LLVM toolchain?)
- Not started yet or no results yet:
- Then in following weeks, look into higher level software (GROMACS etc)
- Goal: ROCm available by end of june 2025, probably in
dev.eessi.io
- Notes at https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-ROCm-support-(2025-04-04)
- Tiger team meeting Friday (4 april): ? maybe one more since April 4?
- Bob met with AMD ROCm developer April 22
- someone from AMD reached out to EasyBuild (via Slack)
- Status: close to having a ROCm-LLVM easyconfig
- Caspar and Bob are building stacks for Cascadelake and Icelake (> 75 %)
- Resumed work on A64FX stack (building on Deucalion) (50 %)
- NVIDIA Grace: 100% - epsilon done
- [SURF] T1.3 Test suite - due M12+M24
- v0.6.0 released: Breaking change in constants => Required ReFrame configs to be updated
- Caspar gave a talk on EESSI test suite at EUM'25 (https://easybuild.io/eum25/#program)
- Hands-on in 20 minutes was tight, but feedback showed at least a handfull of people tried (and some managed) to run the test suite
- [BSC] T1.4 RISC-V (starts M13)
- Working on LLVM in riscv.eessi.io, currently fails on the testing fails. 346 tests (0.25% out of total tests) => Maybe check with Davide, he might be able to help
- Some of the test failures are not specific to RISC-V
- still working on LLVM (some issues even on non-RISC-V)
- in contact with Davide (CECAM), work on Intel/AMD CPUs
- objective now is a minimal installation
- (same as last month) Bob has a build bot running on RISC-V, but it has some issues (smee client crashes)
- Currently, tarballs passed manually to Bob for ingestion
- Bob's plan is to push things to
software.eessi.io
, but not sure if we will still do that for 2023.06
- Working on LLVM in riscv.eessi.io, currently fails on the testing fails. 346 tests (0.25% out of total tests) => Maybe check with Davide, he might be able to help
- [SURF] T1.5 Consolidation (starts M25)
- Improve NVIDIA GPU support
- CUDA sanity check (PR to EasyBuild framework by Caspar, Kenneth/Caspar/Alan reviewing)
- Formalize supported Cuda Compute Capability & CPU architecture combinations
- for CC70, CC80, CC90 it's formalized
- building of software for CPU+GPU combinations has started
- first PR covering CUDA, UCX, UCC, OSU benchmarks built, ingested and merged
- these also include builds for Icelake and Cascadelake stack
- there was a problem building CUDA-Samples version 12.1 for CC70
- don't build that for now for CC70
- but build it for CC80 and CC90
- look into building a newer version of CUDA-Samples for all CCxx
- if that is in place we add some custom modules or Lmod hook for cc70 CUDA-Samples 12.1 pointing to the newer version
- not decided yet, how to deal with minor CCyy versions
- (?) Log / document which combinations are supported natively, and which are cross-compiled
- see also https://gitlab.com/eessi/support/-/issues/142
- we should have a strategy in case something doesn't work
- some strategy emerging, for example, CUDA-Samples issue with cc70
- some efforts to automate building a stack for a new CPU microarchitecture
- see work on building stack for Icelake and Cascadelake
- question is how to ensure that stacks a built as close as possible to how the other stacks were built
- when building a new version of EESSI we want to make such efforts easier
- Improve NVIDIA GPU support
- [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
- [UGent] T5.1 Support portal - due M12
- [SURF] T5.2 Monitoring/testing (starts M9)
- finalising dashboard work and working on deliverable 5.3
- working on identity matrix
- shows systems on X, packages on Y
- quick overview of which modules work on which system
- idea is to integrate this into EESSI documentation
- [UiB] T5.3 community contributions (bot) - due M12
- [UGent] T5.4 support/maintenance (starts M13)
- New rotation proposal for July+August has been sent out
- some issue in July
- Lara follows up on this
- continuing work on improvements for the bot
- reducing chattiness of the bot
- bundling staging PRs (very important for reducing load when building full matrix of CPU+GPU combinations)
- reversing filter match logic for bot build commands
- idea to add a
bot: deploy
command
- New rotation proposal for July+August has been sent out
- [UB] WP6 Community outreach, education, and training
- webinar series on EESSI in May-June 2025
- see https://www.eessi.io/docs/training/2025/webinar-series-2025Q2
- first two were done successfully with about 60 participants
- slides, video recordings available
- will be promoted via CASTIEL2
- Susana will share another post to promote webinars
- see https://www.eessi.io/docs/training/2025/webinar-series-2025Q2
- EESSI-related talk @ HPCKP'25?
- could fit into industry training => Susana will talk to Alan about this
- not yet decided
- could fit into industry training => Susana will talk to Alan about this
- VSC (Austria) considering to switch to EESSI
- help them, open support issue, eventually blog post? => Progress?
- Kenneth had a chat with them during EuroHPC Summit, needs to reply to their email for follow-up
- for next sync: update on progress and decide about promoting this (blog or other)
- help them, open support issue, eventually blog post? => Progress?
- Code-of-the-month: April 30
- ESPReSo
- CECAM workshop, April 8-11
- Modelling
- webinar series on EESSI in May-June 2025
- [HPCNow] WP7 Dissemination, Exploitation & Communication
- podcast with EuroCC Belgium about MultiXscale and EESSI
- press release with EuroHPC JU -> blog post on EuroHPC JU website
- EuroHPC Summit
- still interested in blog post?
- TODO: blog post to highlight some things? => Kenneth will write this up for EESSI-blog, has some pictures, link to the official press release (but blog post for EB user meeting needs to happen first)
- EESSI @ EasyBuild User Meeting
- slides available via https://easybuild.io/eum25/#program
- recordings available via YouTube playlist
- WIP: blog post on (new) EasyBuild blog => Helena is taking care of this
- blog post was done, maybe promote via social
- ISC'25
- proposals for tutorial/BoF on EESSI were not accepted :(
- Helena will be able to present in POP3 workshop: extreme scale application stuff
- Lara:
dev.eessi.io
could be valuable/relevant in this context - Helena: event page seems to show a lot of talks, but maybe they can squeeze us in
- Lara:
- talk @ EuroHPC booth: can be handled by Do IT Now
- there will be a talk, same poster as in EuroHPC summit will be used
- Do IT Now will have a booth
- raffle with RPi starter kit
- Going: Helena & others from Do IT Now
- GOOD conference (Open OnDemand)
- blog post done, https://www.eessi.io/docs/blog/2025/04/03/eessi-at-good-conf/
- European researchers night, September 26
- ask researchers working in MultiXscale
- similar to what was done for International Women's Day campaign
- editing some banners
- "press release" on EESSI award => EuroHPC success story
- It finally happened: https://eurohpc-ju.europa.eu/eessi-does-it-award-winning-software-story-2025-04-07_en
- still TODO: also short blog post on this on EESSI blog? => will be tackled in the blog post on EuroHPC summit
- Lara will be on panel digital humanities alumnis day (10 year anniversary)
- May 26th
- promoting EESSI, MultiXscale, ...
- Susana can promote it via social media
- Update on the page of available software (@petra) => Still under review
- related to adding NVIDIA Grace
- some improvements on order, ...
- Pedro takes a look into the PR
- event AHM EuroCC, NCCs, CoEs
- Estonia, Tallinn
- 23-25 September
- discussing who can go: Alan?, Helena?
- seems to be possible to attend online
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
Other topics
- Thomas: blog post Grace stack?