Sync meeting 2025 04 08 - multixscale/meetings GitHub Wiki

MultiXscale WP1+WP5 sync meetings


Next meetings

  • Tue 13 May 2025 09:30 CEST
  • Tue 10 June 2025 09:30 CEST
  • Tue 8 July 2025 09:30 CEST (without Kenneth)
    • will be rescheduled to 1 July

Agenda/notes 2025-04-08

attending:

  • Petra (NIC)
  • Thomas (UiB)
  • Caspar (SURF)
  • Kenneth, Lara (UGent)
  • Helena, Susana, Arturo (Do IT Now)
  • Bob, Pedro (RUG)

General

  • Petra replacing Neja as Project Manager for MultiXscale
  • overview of MultiXscale planning
  • upcoming project review
    • final date: Fri 12 Sept 2025 (confirmed)
      • confirmed by Neja
      • Agenda & list of participants is on OneDrive
      • Is it confirmed that WP1/WP5 leads can be remote
        • Caspar is on holiday :D
        • confirmed (?) by PM, focus will be on scientific WPs
        • date + remote attendance confirmed by Neja
  • amendment update
    • still waiting for feedback from PO

Deliverables due M30

  • Upcoming deliverables (M30 - June 2025):
    • Discussed D1.4, 1.5 and 5.3 on 07-04-2025
    • D1.4 Support for emerging system architectures => RIJKSUNI (Pedro)
    • D1.5 Portable test suite for shared software stack => SURF (Caspar)
    • D5.3 Report on testing provided software => UGent (Kenneth)
      • writing effort by Lara/Kenneth/Satish/Maksim (due end of April), review of draft by ??? (Caspar?) (mid May)
      • Focus on Dashboard (Maksim) and periodic tests (Lara/Satish)
      • Will include analysis of selected tests, to show what information we can get (e.g. we can spot OS upgrades, changes in tests themselves, etc)
      • https://github.com/multixscale/planning/issues/153
      • Overleaf
      • Responsible / writers: Maxim (dashboard), Satish + Lara (analysis of periodic runs)
      • Review: ???
    • keep deliverables short => ~15 pages max.
    • set early internal deadline to get these fully done: 1st week of June?
      • Timeline for D1.4, 1.5, 5.3:
        • Complete draft version: 30 April 2025 (meeting: 10:30-12:00 CEST to discuss)
        • Review of draft by mid May
        • Camera-ready version: 28 May 2025 (meeting: 10:30-12:00 CEST to discuss)
          • Assess whether deliverables are ready to go to MultiXscale steering committee
    • D6.3 Interim report on Community outreach, Education, and Training => NIC (Petra)
      • once outline is there, we should sync up? Has this been done?
      • https://github.com/multixscale/planning/issues/89
      • Summary of activity is in the Overleaf project
      • WIP by Neja
      • should ask Susana/Alan for help?
      • should take into account contents D7.2 (already delivered)
      • review report mentions not enough outreach to general public
        • mostly focused on scientific aspect of MultiXscale?
        • 11 Feb 2025: International Day for Womans + Girls in Science
          • any MultiXscale partners doing activities for this?
        • 8 March 2025: International Women's Day
          • featuring Women working for MultiXscale
          • Lara + Celine were selected for SC'24 women's profiles
        • interview with Matej to explain why work done in MultiXscale is relevant to society
          • maybe involve other people?
        • press release was prepared for MultiXscale General Assembly
          • wasn't really picked up
        • (Alan) we should poke CASTIEL2 on this, ask them for help
        • EuroHPC podcast interview (Lara + Kenneth with NCC Belgium)
          • recording scheduled for 7 March 2025
          • improve balance between EESSI/scientific work
          • pull in Tilen remotely?
        • short YouTube videos explaining what MultiXscale does?

WP status updates

  • WP status updates
    • [SURF] WP1 Developing a Central Platform for Scientific Software on Emerging Exascale Technologies
      • [UGent] T1.1 Stable (EESSI) - due M12+M24
      • [RUG] T1.2 Extending support (starts M9, due M30)
        • NVIDIA Grace: lot's of progress, ~80% done
          • zen2+CC80 (A100) = 984 modules
          • Grace+CC90 (H100) = 748 modules, so getting close
          • 2023b (CPU only) is mostly done, except Siesta
          • 2023a: very close to finishing that
          • 2022b: just started
          • Failing tests with test suite of GROMACS w/ CUDA not seen before on Grace node @ SURF ETP when tried interactively before, WIP
          • Do we have a non-personal bot that can be used to keep installations for NVIDIA Grace in sync?
            • Not yet, only personal bots for Thomas/Ricard
            • Service account would be needed, should ask JSC? => Thomas will ask
        • AMD ROCm: (Bob)
          • Status: close to having a ROCm-LLVM easyconfig
          • Next few weeks, try to build rest of the toolchain
          • Alan will look into exposing AMD drivers, similar to what we do for NVIDIA drivers
          • Alan will look into EasyBuild toolchain (is it needed, what would be in there, can it share stuff with the regular LLVM toolchain?)
          • Then in following weeks, look into higher level software (GROMACS etc)
          • Goal: ROCm available by end of may 2025, probably in dev.eessi.io
          • Notes at https://github.com/EESSI/meetings/wiki/Sync-meeting-on-EESSI-ROCm-support-(2025-04-04)
          • Tiger team meeting last Friday (4 april)
      • [SURF] T1.3 Test suite - due M12+M24
        • v0.6.0 released: Breaking change in constants => Required ReFrame configs to be updated
        • Caspar gave a talk on EESSI test suite at EUM'25 (https://easybuild.io/eum25/#program)
          • Hands-on in 20 minutes was tight, but feedback showed at least a handfull of people tried (and some managed) to run the test suite
      • [BSC] T1.4 RISC-V (starts M13)
        • Working on LLVM in riscv.eessi.io, currently fails on the testing fails. 346 tests (0.25% out of total tests) => Maybe check with Davide, he might be able to help
          • Some of the test failures are not specific to RISC-V
        • Added a bunch more software :)
        • Bob has a build bot running on RISC-V, but it has some issues (smee client crashes)
          • Currently, tarballs passed manually to Bob for ingestion
          • Bob's plan is to push things to software.eessi.io, but not sure if we will still do that for 2023.06
      • [SURF] T1.5 Consolidation (starts M25)
        • Improve NVIDIA GPU support
          • CUDA sanity check
          • Formalize supported Cuda Compute Capability & CPU architecture combinations
          • Log / document which combinations are supported natively, and which are cross-compiled
          • see also https://gitlab.com/eessi/support/-/issues/142
            • we should have a strategy in case something doesn't work
    • [UGent] WP5 Building, Supporting and Maintaining a Central Shared Stack of Optimized Scientific Software Installations
      • [UGent] T5.1 Support portal - due M12
      • [SURF] T5.2 Monitoring/testing (starts M9)
        • ~10 systems now in dashboard @ https://dashboard.eessi.io
          • Richard added more systems
          • Total 100k tests stored
          • Scroll bars are finally there, so dashboard also works on smaller screens
          • now also via HTTPS
          • We has set up an alias for this: https://dashboard.eessi.io (via Alan)
        • no confirmation yet from IT4I whether we can expose performance data from Karolina
        • Last meeting: prepare document to request approval from sites to publish data publicly
          • Caspar: I don't think we should go this way, a formal document would only require more formal approval, which slows things down. Proposal: read the usage agreement carefully and if it doesn't restrict publishing performance data... just do it? Alternative: remove Karolina, then hope that peer pressure pushes sites to want to be added?
      • [UiB] T5.3 community contributions (bot) - due M12
      • [UGent] T5.4 support/maintenance (starts M13)
        • New rotation proposal has been sent out, everyone agreed, so new invitations will be sent out end of this week
        • All going fine
    • [UB] WP6 Community outreach, education, and training
      • webinar series on EESSI in May-June 2025
      • EESSI-related talk @ HPCKP'25?
        • could fit into industry training => Susana will talk to Alan about this
      • VSC (Austria) considering to switch to EESSI
        • help them, open support issue, eventually blog post? => Progress?
          • Kenneth had a chat with them during EuroHPC Summit, needs to reply to their email for follow-up
    • [HPCNow] WP7 Dissemination, Exploitation & Communication
      • EuroHPC Summit
        • poster + sticker handout
        • EESSI mentioned during EuroHPC Federation Platform session
        • TODO: blog post to highlight some things? => Kenneth will write this up for EESSI-blog, has some pictures, link to the official press release (but blog post for EB user meeting needs to happen first)
      • EESSI @ EasyBuild User Meeting
      • ISC'25
        • proposals for tutorial/BoF on EESSI were not accepted :(
        • Eli & Helena will be able to present in POP3 workshop: extreme scale application stuff
          • Lara: dev.eessi.io could be valuable/relevant in this context
          • Helena: event page seems to show a lot of talks, but maybe they can squeeze us in
          • TODO: Helena wil lreach out to POP3 to see if there is still space. If so, Lara & Helena will draft up an abstract on dev.eessi.io
        • talk @ EuroHPC booth: can be handled by Do IT Now
        • raffle with RPi starter kit
        • We can display the same poster as EuroHPC Summit, or we can send a new version => Kenneth: they can use the same one, number of software packages could be updated, but is probably not worth the effort.
        • Going: Helena, Eli (& others from Do IT Now)
      • GOOD conference (Open OnDemand)
      • International Womens Day campaign
      • "press release" on EESSI award => EuroHPC success story
      • EuroHPC Podcast: recording was done in Brussels with Kenneth & Lara
        • Draft edit, Kenneth & Lara need to give their ok
      • Update on the page of available software (@petra) => Still under review

Other topics

  • Thomas: When Grace stack is finished: how about making a blog post? => Kenneth: we can make a nice spin and state that the software stack for Jupiter is 'ready' before the system is there :)
    • Could try to redo a plot we made for x86 on gromacs, but then for aarch64 to show the benefit of optimizing for the CPU architecture of Grace

Notes of previous meetings

see https://github.com/multixscale/meetings/wiki