Developer Workshop 2025 - parthenon-hpc-lab/parthenon GitHub Wiki

Event

  • 16-20 Jun in Ann Arbor, MI
  • 4th floor conference room of 3520 Green Ct.

Presentation Notes

Day 1

  • AthenaPK

    • RKL2 algorithm in parthenon?
    • Flux-based pushing of tracers — use Riemann fluxes to push.
    • Legacy packs with single vector for cons and prim.
    • Inline C2P is seemingly faster? Need to carry both cons and prim pack?
    • Will remove legacy packs this week?
    • Compile time: ~10min on 16cores
    • pack_size —> npartitions —> npacks_per_rank?
    • Where should if vs if constexpr go? Breaking up kernels seems to pay dividends…
      • Craziness on Frontier (compilers failing or even silent failing)
    • Reading/writing from 40TB outputs
    • STS with 26 ghost zones to prevent comms in between sub-cycles
    • Coalesced comms
    • HDF5 seemingly “unsolvable”
    • Embedded performance monitoring (permit aborting if not getting expected performance)
    • Balancing where you don’t lose allocation --> just make bad node go idle
    • Watchdog monitors in parthenon?
    • Parallel yt doesn’t seem to work at scale... Select sub-volume, go from there
    • Ascent is no longer part of @pgrete’s workflow. Working for Wibking
    • Swarms and load balancing... diverging flows
    • Monitor different timestep constraints
  • Artemis

    • When adding something to a package, can we add some sort of Metadata info so that it can be absorbed into documentation
    • gravity_pkg->template Param(“gx1”) can template go away?
    • VI(n, DIR) <— can we remove?
    • Should we adopt parth coordinates?
  • Jaybenne

    • Ryan reports overall positive experience with Parthenon
    • Reports our documentation is good (!?!?!)
    • Showed first demonstration of coupling two Parthenon based codes (i.e., Artemis + Jaybenne)
  • Phoebus

    • Code “theft” is quite straightforward given we all speak the same language
  • Riot

    • Arbitrary variables, arbitrary blocks —> Efficient pack
    • Coalesced comms can buy back lost MPI performance
    • Fused loops i,j for CPU performance… but how about GPUs?
    • Sparse physics <— only include subsets of blocks that are “active”
    • How about implicit with sparse physics?
    • Can ignore dt constraint for inactive blocks?
    • Timer-based load balancing works ~perfectly on CPUs... but performance portability is challenging
      • Need to get per-block timings, not an average
    • Load balancing for implicit?
    • Domain decomposition itself should be interrogated... sweeps
  • Multiscale Hybrid MHD-runaway

    • Parthenon mesh is used to discretize momentum space, not physical space
    • Particles are attached to blocks, but blocks are associated with momentum space, not physical space
    • Needs particle AMR
  • Flash with Parthenon

    • Path to parth: Open-ACC was a no-go
    • Flash already employed paramesh… one-to-one mapping to parthenon
    • Template dispatcher to avoid gross nested `if’s associated with functions templated on e.g., recon, riemann
    • Generate documentation in CI / compile time by rps->Addstd::string(“parthenon/mesh”, “refinement”)
    • Problem with pin->GetOrAddReal()... many downstream codes extract the same input from multiple packages
  • Assail

    • Use of Metadata::Fine
    • Box mesh has weird connectivity
    • Remapping between old and new meshes is a computational geometry nightmare
    • Lots of thoughts about Adam’s input parsing
    • Need multiple fields inside AMR routines
    • Need mesh generators

Day 2

  • Parthenon in 2025

    • Non-cell-centered fields
    • Sparse/swarm packs
    • Forest of octrees
    • Ownership model is complicated
    • We do not share edge fluxes across same-level boundaries
    • Added new prolongation/restriction ops for new non-CC fields
    • Metadata::CellMemAligned is set by default for fluxes
    • Flux divergence updates do not exist (in parts) for non-CC fields
    • Boundary conditions need work with non-CC fields, particularly with linear solvers
    • Need to put the macro type-based stuff in parthenon, accessible to downstream codes
    • MG boundary conditions are WIP... e.g., multipole BCs for e.g., self-gravity may not be immediately straightforward
    • Filling memory space that doesn’t exist for 3-valent node positions seems a bit not well-posed… what is the right decision?
    • In 3D, allowing 3-valent, 4-valent, and 5-valent edges leads to a zoo of craziness
    • OpenPMD is the standard; ADIOS2 is the framework (supporting different file types)
      • Permits derived variables (e.g., not actually store, but computed from e.g., a “cons” vector)
      • No XDMF required anymore with OpenPMD/ADIOS2
      • GPU-aware IO (no channeling through Host)
    • Histograms in Ascent were so slow that they were manually added to Parthenon
  • Singularity Ecosystem

    • How do we prevent having like four different copies of kokkos, ports-of-call, etc. among different submodules
    • Multigroup opacities (i.e., accesses passing a group index) are not available yet
    • Interest in more complicated mixture models
    • Interest in rate equations… does it belong in singularity ecosystem?
    • Submodule vs fetch: Miller favors submodule; Grete asks if Jonah’s concerns over cmake fetch are alleviated by shipping our own tar files
    • Logs are slow… transcendental functions are a performant path forward (3x on intel?)
    • On ARM systems, transcendental functions for logs are not performant (because there is dedicated hardware for log)

Idea Board

  • small meshblock performance
  • big meshblock performance/hierarchical parallelism
  • Params/Pinput/Auto-capturing inputs for documentation
  • Number of packs per rank
  • Synchronize coordinates implementations
  • Expose type-based variable macros
  • Clean up examples and regression tests
  • Improve documentation, and/or make a tutorial
  • Pack interfaces
  • Upstreaming features:
    • Update utilities for CT
    • High-order utilities
      • Maybe some discussion to be had about curating downstream reconstructions
      • settling on an interface
        • Adam to walk us through how subpacks work at next parthenon sync
        • Ben to walk us through method in KHARMA at an upcoming sync
      • where does this all go in repository?
        • Separate concerns a bit more:
          • think about a name for what this might be
          • shared packages?
          • should also refactor/re-organize interface folder
    • Additional indexing utilities
    • Clean up type-based variable naming to better intermix sparse and tensor indices
    • RKL2/IMEX
  • CI:
    • more automated checks
    • downstream code tests
    • performance benchmarking
    • ParameterInput auto doc building
    • clang tidy
  • Warning free compilation
  • Are we happy with our current IO and vis toolchain?
    • how much more could/should parthenon own in vis world
    • Ray tracing
    • slices
  • Code-sharing database
  • Timestep related stuff
    • Two dts -> 1 dt
    • Different packages report dt automatically
      • Report which package is controlling the timestep (Sam)
      • Report cell that controls the timestep
    • Trial and restart timestep
  • Boundary conditions
    • shearing boxes
    • forest boundaries
  • Combine reductions across reduction calls
  • More realistic benchmarking
    • More benchmarks in CI like our real codes
    • Parthenon-VIBE needs to be updated
  • User-defined domain decomposition
  • Single/mixed precision
  • Frankenblocks
  • Autotuning of kernels
  • Additional data types for fields?
    • refactor which things are templated on underlying variable type
  • Downstream help

Things worked on

  • Unified par dispatch (Forrest + Adam R.)
  • Embedded performance monitoring/timeouts
  • OpenPMD/Adios2 (Philipp, Patrick)
  • Update Partheon Hydro (Carolyn)
  • Swarm AMR (Oleksii)
  • Time-dependent boundary conditions in a backwards compatible way (Brandon, Adam R., Ben P.)
  • Type-based scratch (Luke, Adam R., Ben P.)
  • Remeshing on load (Jonah, Ben P.)
    • Two meshes
    • Load and interpolate arbitrary data
  • Automatically capture/document Parameter Input (Jonah, Adam R.)

Things we're aspiring to do

  • Not much progress
    • Coalesced comms
    • Load balancing (Josh, Forrest)
      • Heuristics (Jonah?)
        • PCA
        • AI for load balancing
    • 1 Tensor timestep on one block
    • Mesh-level data (Philipp)
    • Markdown in sphincs docs (Jonah, Ben)
    • Problem generators refactor (Jonah)
  • Remove legacy packs (Luke, Jonah, Downstreams)
    • PackDescriptor caching MR (Adam)
    • Phoebus removing them
    • Philipp to test
    • Update at next sync

Things discussed

  • Inputs
    • TOML (Brandon?)
    • Compilable input files (Adam D.)
  • Spherical-polar meshblock connectivity/transmitting boundaries (Adam D., Josh, Luke, Ben)
  • Coordinates caching (Jonah, Adam D., Luke, Josh)
  • Mesh initialization (Jonah, Adam D., Luke, Josh)
  • HPSF
    • Broadly we're all interested
    • Someone needs to help organize the administrative pieces (Philipp)
      • github PR template with questionnaire. Philipp to investigate and then report back at next sync
        • Code of conduct
        • Governance plan
        • Multiple levels to enter
          • big leagues
          • small, single person/institution
          • in between
    • Someone at LANL needs to figure out the legal questions from our side. Josh to email Feynmann center.
  • Tensors
    • SPARC team on the hook
    • Interest from broader Parthenon community
    • Lots of uncertainty
      • Unlikely to be intrusive to the rest of the code
    • Not sure what tooling necessary/available yet
      • Boba (form LLNL)
      • Memory pools/Umpire?
      • Tensorflow?
    • How general do we want to be?
      • Boltzmann only?
      • Quantum many body?
      • Core that pulls all of space
      • Architectures?
        • Only trains?
        • Hierarchical tucker?
  • Sophisticated Coordinates. Supports:
    • 2D Cubed sphere (Josh, Adam D., Jonah, Luke)
    • 2D box mesh
    • simple Mesh reader
  • 3D forest neighbor finding. Supports:
    • 3D box mesh
    • more exciting 3d meshes
  • We gotta get away from templating on Real
  • C++20
    • Concepts don't seem to work with nvidia compilers?
      • Adam D says nvcc does support in limited way?
      • May be an opportunity to remove concepts lite when they work
    • Kokkos will soon demand C++20. Summer 2025. Pretty imminent.

Things tabled

  • Julia bindings?
  • Local timestepping

New time for parthenon syncs

  • Every other Thursday, 3pm ET

Auxiliary Topics

In no particular order with no particular priority.

Extracted from past meeting notes, discussion, last dev meeting agenda, issues, etc.

Please add/edit as needed (also during the week).

  • State of the union
  • Next "big" features/improvements
    • Params <-> pinput <-> plus info from where parameters came from
    • Upstream support for non-cart coordinates (and caching coord data)
    • Cleanup example directory
      • need to update/document examples, benchmarks
      • move some examples to regression tests
      • highlight features of each examples
    • Update parthenon-hydro
    • Enroll/reporting various timestep constraints (per package) -> vector
    • Add capability to redo a timestep (might need to move calculation of dt out of Step())
    • Consistency of calculating dt for AMR sims
    • Mesh refinement strategies (other than 2x2x2) <-> Frankenblocks
    • CMake presets instead of machine files (or use shell scripts) or keep as is
    • Report/estimate memory usage requirements
    • Particles load balancing/AMR
    • Autotuning of kernels #1200
      • evaluate different abstraction to handle (hierarchical) parallelism that works for small and large number of blocks
    • New par dispatch machinery
    • Code manifest #1227
    • Derived/virtual field registry #1228
    • standardized format for input files
    • UnReal fields
    • Mesh variables
    • buffer communications (performance, packing, ...)
    • performance regression tests
    • integrated performance monitoring #1180
    • get rid or warnings
    • single/mixed precision
    • Tensors/linear algebra
    • load balancing/timers
    • upstream support for multi-patch
    • remove legacy packs/maybe wrap sparse packs
    • Threading for tasks
    • in situ vis
    • adaptive timestepping
    • ...
  • Open issues and PRs
    • Coalesced buffer comm #1192
    • Unified par_dispatch #1142
    • ...
  • Documentation
    • Overall reorg/cleanup
    • Tutorial/intro
    • Best pratices/performance hints
    • FAQ
  • Administrative
    • future meeting time
    • Github cleanup (issues, PRs, branches)
    • Logo
    • Update of outreach/impact here
⚠️ **GitHub.com Fallback** ⚠️