Sync meeting 2025 04 08 - multixscale/meetings GitHub Wiki
MultiXscale WP1+WP5 sync meetings
- Monthly, every 2nd Tuesday of the month at 09:30 CE(S)T
- Notes of previous meetings at https://github.com/multixscale/meetings/wiki
Next meetings
- Tue 13 May 2025 09:30 CEST
- Tue 10 June 2025 09:30 CEST
- Tue 8 July 2025 09:30 CEST (without Kenneth)
- will be rescheduled to 1 July
Agenda/notes 2025-04-08
attending:
- Petra (NIC)
- Thomas (UiB)
- Caspar (SURF)
- Kenneth, Lara (UGent)
- Helena, Susana, Arturo (Do IT Now)
- Bob, Pedro (RUG)
General
- Petra replacing Neja as Project Manager for MultiXscale
- overview of MultiXscale planning
- upcoming project review
- final date: Fri 12 Sept 2025 (confirmed)
- confirmed by Neja
- Agenda & list of participants is on OneDrive
- Is it confirmed that WP1/WP5 leads can be remote
- Caspar is on holiday :D
- confirmed (?) by PM, focus will be on scientific WPs
- date + remote attendance confirmed by Neja
- final date: Fri 12 Sept 2025 (confirmed)
- amendment update
- still waiting for feedback from PO
Deliverables due M30
- Upcoming deliverables (M30 - June 2025):
- Discussed D1.4, 1.5 and 5.3 on 07-04-2025
- see notes @ https://hackmd.io/WHVWZR3ITu20oJkLZZqd0w
- D1.4 Support for emerging system architectures => RIJKSUNI (Pedro)
- writing effort by Bob + Pedro (due end of April), review of draft by Thomas (mid May)
- https://github.com/multixscale/planning/issues/104
- link to Overleaf project
- first draft already there
- Further content as proposed in meeting notes
- Responsible / writers: Pedro (Bob?)
- Review: Thomas
- D1.5 Portable test suite for shared software stack => SURF (Caspar)
- writing effort by Caspar (due end of April), review of draft by Kenneth (mid May)
- Focus on Test suite itself (i.e. code)
- https://github.com/multixscale/planning/issues/105
- link to Overleaf project
- Responsible / writers: Caspar, Satish
- Reviewers: Kenneth
- D5.3 Report on testing provided software => UGent (Kenneth)
- writing effort by Lara/Kenneth/Satish/Maksim (due end of April), review of draft by ??? (Caspar?) (mid May)
- Focus on Dashboard (Maksim) and periodic tests (Lara/Satish)
- Will include analysis of selected tests, to show what information we can get (e.g. we can spot OS upgrades, changes in tests themselves, etc)
- https://github.com/multixscale/planning/issues/153
- Overleaf
- Responsible / writers: Maxim (dashboard), Satish + Lara (analysis of periodic runs)
- Review: ???
- keep deliverables short => ~15 pages max.
- set early internal deadline to get these fully done: 1st week of June?
- Timeline for D1.4, 1.5, 5.3:
- Complete draft version: 30 April 2025 (meeting: 10:30-12:00 CEST to discuss)
- Review of draft by mid May
- Camera-ready version: 28 May 2025 (meeting: 10:30-12:00 CEST to discuss)
- Assess whether deliverables are ready to go to MultiXscale steering committee
- Timeline for D1.4, 1.5, 5.3:
- D6.3 Interim report on Community outreach, Education, and Training => NIC (Petra)
- once outline is there, we should sync up? Has this been done?
- https://github.com/multixscale/planning/issues/89
- Summary of activity is in the Overleaf project
- WIP by Neja
- should ask Susana/Alan for help?
- should take into account contents D7.2 (already delivered)
- review report mentions not enough outreach to general public
- mostly focused on scientific aspect of MultiXscale?
- 11 Feb 2025: International Day for Womans + Girls in Science
- any MultiXscale partners doing activities for this?
- 8 March 2025: International Women's Day
- featuring Women working for MultiXscale
- Lara + Celine were selected for SC'24 women's profiles
- interview with Matej to explain why work done in MultiXscale is relevant to society
- maybe involve other people?
- press release was prepared for MultiXscale General Assembly
- wasn't really picked up
- (Alan) we should poke CASTIEL2 on this, ask them for help
- EuroHPC podcast interview (Lara + Kenneth with NCC Belgium)
- recording scheduled for 7 March 2025
- improve balance between EESSI/scientific work
- pull in Tilen remotely?
- short YouTube videos explaining what MultiXscale does?
- Discussed D1.4, 1.5 and 5.3 on 07-04-2025
WP status updates
- WP status updates
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
- [UGent] T1.1 Stable (EESSI) - due M12+M24
- [RUG] T1.2 Extending support (starts M9, due M30)
- NVIDIA Grace: lot's of progress, ~80% done
- zen2+CC80 (A100) = 984 modules
- Grace+CC90 (H100) = 748 modules, so getting close
- 2023b (CPU only) is mostly done, except Siesta
- 2023a: very close to finishing that
- 2022b: just started
- Failing tests with test suite of GROMACS w/ CUDA not seen before on Grace node @ SURF ETP when tried interactively before, WIP
- Do we have a non-personal bot that can be used to keep installations for NVIDIA Grace in sync?
- Not yet, only personal bots for Thomas/Ricard
- Service account would be needed, should ask JSC? => Thomas will ask
- AMD ROCm: (Bob)
- Status: close to having a ROCm-LLVM easyconfig
- Next few weeks, try to build rest of the toolchain
- Alan will look into exposing AMD drivers, similar to what we do for NVIDIA drivers
- Alan will look into EasyBuild toolchain (is it needed, what would be in there, can it share stuff with the regular LLVM toolchain?)
- Then in following weeks, look into higher level software (GROMACS etc)
- Goal: ROCm available by end of may 2025, probably in
dev.eessi.io
- Notes at https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-ROCm-support-(2025-04-04)
- Tiger team meeting last Friday (4 april)
- NVIDIA Grace: lot's of progress, ~80% done
- [SURF] T1.3 Test suite - due M12+M24
- v0.6.0 released: Breaking change in constants => Required ReFrame configs to be updated
- Caspar gave a talk on EESSI test suite at EUM'25 (https://easybuild.io/eum25/#program)
- Hands-on in 20 minutes was tight, but feedback showed at least a handfull of people tried (and some managed) to run the test suite
- [BSC] T1.4 RISC-V (starts M13)
- Working on LLVM in riscv.eessi.io, currently fails on the testing fails. 346 tests (0.25% out of total tests) => Maybe check with Davide, he might be able to help
- Some of the test failures are not specific to RISC-V
- Added a bunch more software :)
- Bob has a build bot running on RISC-V, but it has some issues (smee client crashes)
- Currently, tarballs passed manually to Bob for ingestion
- Bob's plan is to push things to
software.eessi.io
, but not sure if we will still do that for 2023.06
- Working on LLVM in riscv.eessi.io, currently fails on the testing fails. 346 tests (0.25% out of total tests) => Maybe check with Davide, he might be able to help
- [SURF] T1.5 Consolidation (starts M25)
- Improve NVIDIA GPU support
- CUDA sanity check
- Formalize supported Cuda Compute Capability & CPU architecture combinations
- Log / document which combinations are supported natively, and which are cross-compiled
- see also https://gitlab.com/eessi/support/-/issues/142
- we should have a strategy in case something doesn't work
- Improve NVIDIA GPU support
- [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
- [UGent] T5.1 Support portal - due M12
- [SURF] T5.2 Monitoring/testing (starts M9)
- ~10 systems now in dashboard @ https://dashboard.eessi.io
- Richard added more systems
- Total 100k tests stored
- Scroll bars are finally there, so dashboard also works on smaller screens
- now also via HTTPS
- We has set up an alias for this: https://dashboard.eessi.io (via Alan)
- no confirmation yet from IT4I whether we can expose performance data from Karolina
- Last meeting: prepare document to request approval from sites to publish data publicly
- Caspar: I don't think we should go this way, a formal document would only require more formal approval, which slows things down. Proposal: read the usage agreement carefully and if it doesn't restrict publishing performance data... just do it? Alternative: remove Karolina, then hope that peer pressure pushes sites to want to be added?
- ~10 systems now in dashboard @ https://dashboard.eessi.io
- [UiB] T5.3 community contributions (bot) - due M12
- [UGent] T5.4 support/maintenance (starts M13)
- New rotation proposal has been sent out, everyone agreed, so new invitations will be sent out end of this week
- All going fine
- [UB] WP6 Community outreach, education, and training
- webinar series on EESSI in May-June 2025
- see https://www.eessi.io/docs/training/2025/webinar-series-2025Q2
- will be promoted via CASTIEL2
- TODO
- announce + get registrations?
- EESSI-related talk @ HPCKP'25?
- could fit into industry training => Susana will talk to Alan about this
- VSC (Austria) considering to switch to EESSI
- help them, open support issue, eventually blog post? => Progress?
- Kenneth had a chat with them during EuroHPC Summit, needs to reply to their email for follow-up
- help them, open support issue, eventually blog post? => Progress?
- webinar series on EESSI in May-June 2025
- [HPCNow] WP7 Dissemination, Exploitation & Communication
- EuroHPC Summit
- poster + sticker handout
- EESSI mentioned during EuroHPC Federation Platform session
- TODO: blog post to highlight some things? => Kenneth will write this up for EESSI-blog, has some pictures, link to the official press release (but blog post for EB user meeting needs to happen first)
- EESSI @ EasyBuild User Meeting
- slides available via https://easybuild.io/eum25/#program
- recordings available via YouTube playlist
- WIP: blog post on (new) EasyBuild blog => Helena is taking care of this
- ISC'25
- proposals for tutorial/BoF on EESSI were not accepted :(
- Eli & Helena will be able to present in POP3 workshop: extreme scale application stuff
- Lara:
dev.eessi.io
could be valuable/relevant in this context - Helena: event page seems to show a lot of talks, but maybe they can squeeze us in
- TODO: Helena wil lreach out to POP3 to see if there is still space. If so, Lara & Helena will draft up an abstract on
dev.eessi.io
- Lara:
- talk @ EuroHPC booth: can be handled by Do IT Now
- raffle with RPi starter kit
- We can display the same poster as EuroHPC Summit, or we can send a new version => Kenneth: they can use the same one, number of software packages could be updated, but is probably not worth the effort.
- Going: Helena, Eli (& others from Do IT Now)
- GOOD conference (Open OnDemand)
- March 17-21, see https://openondemand.org/good
- booth from Do IT Now
- talk on connecting EESSI with Open OnDemand: https://cfp.openondemand.org/2025/talk/BX7BCF/
- also blog post on this? => Status?
- PR: https://github.com/EESSI/docs/pull/427
- WIP, but nearly done
- also blog post on this? => Status?
- International Womens Day campaign
- "press release" on EESSI award => EuroHPC success story
- It finally happened: https://eurohpc-ju.europa.eu/eessi-does-it-award-winning-software-story-2025-04-07_en
- TODO: also short blog post on this on EESSI blog? => will be tackled in the blog post on EuroHPC summit
- EuroHPC Podcast: recording was done in Brussels with Kenneth & Lara
- Draft edit, Kenneth & Lara need to give their ok
- Update on the page of available software (@petra) => Still under review
- EuroHPC Summit
- [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
Other topics
- Thomas: When Grace stack is finished: how about making a blog post? => Kenneth: we can make a nice spin and state that the software stack for Jupiter is 'ready' before the system is there :)
- Could try to redo a plot we made for x86 on gromacs, but then for aarch64 to show the benefit of optimizing for the CPU architecture of Grace