Developer Workshop 2025 - parthenon-hpc-lab/parthenon GitHub Wiki
- 16-20 Jun in Ann Arbor, MI
- 4th floor conference room of 3520 Green Ct.
-
AthenaPK
- RKL2 algorithm in parthenon?
- Flux-based pushing of tracers — use Riemann fluxes to push.
- Legacy packs with single vector for
consandprim. - Inline C2P is seemingly faster? Need to carry both
consandprimpack? - Will remove legacy packs this week?
- Compile time: ~10min on 16cores
- pack_size —> npartitions —> npacks_per_rank?
- Where should
ifvsif constexprgo? Breaking up kernels seems to pay dividends…- Craziness on Frontier (compilers failing or even silent failing)
- Reading/writing from 40TB outputs
- STS with 26 ghost zones to prevent comms in between sub-cycles
- Coalesced comms
- HDF5 seemingly “unsolvable”
- Embedded performance monitoring (permit aborting if not getting expected performance)
- Balancing where you don’t lose allocation --> just make bad node go idle
- Watchdog monitors in parthenon?
- Parallel
ytdoesn’t seem to work at scale... Select sub-volume, go from there - Ascent is no longer part of @pgrete’s workflow. Working for Wibking
- Swarms and load balancing... diverging flows
- Monitor different timestep constraints
-
Artemis
- When adding something to a package, can we add some sort of Metadata info so that it can be absorbed into documentation
- gravity_pkg->template Param(“gx1”) can template go away?
- VI(n, DIR) <— can we remove?
- Should we adopt parth coordinates?
-
Jaybenne
- Ryan reports overall positive experience with Parthenon
- Reports our documentation is good (!?!?!)
- Showed first demonstration of coupling two Parthenon based codes (i.e., Artemis + Jaybenne)
-
Phoebus
- Code “theft” is quite straightforward given we all speak the same language
-
Riot
- Arbitrary variables, arbitrary blocks —> Efficient pack
- Coalesced comms can buy back lost MPI performance
- Fused loops i,j for CPU performance… but how about GPUs?
- Sparse physics <— only include subsets of blocks that are “active”
- How about implicit with sparse physics?
- Can ignore dt constraint for inactive blocks?
- Timer-based load balancing works ~perfectly on CPUs... but performance portability is challenging
- Need to get per-block timings, not an average
- Load balancing for implicit?
- Domain decomposition itself should be interrogated... sweeps
-
Multiscale Hybrid MHD-runaway
- Parthenon mesh is used to discretize momentum space, not physical space
- Particles are attached to blocks, but blocks are associated with momentum space, not physical space
- Needs particle AMR
-
Flash with Parthenon
- Path to parth: Open-ACC was a no-go
- Flash already employed paramesh… one-to-one mapping to parthenon
- Template dispatcher to avoid gross nested `if’s associated with functions templated on e.g., recon, riemann
- Generate documentation in CI / compile time by rps->Addstd::string(“parthenon/mesh”, “refinement”)
- Problem with pin->GetOrAddReal()... many downstream codes extract the same input from multiple packages
-
Assail
- Use of Metadata::Fine
- Box mesh has weird connectivity
- Remapping between old and new meshes is a computational geometry nightmare
- Lots of thoughts about Adam’s input parsing
- Need multiple fields inside AMR routines
- Need mesh generators
-
Parthenon in 2025
- Non-cell-centered fields
- Sparse/swarm packs
- Forest of octrees
- Ownership model is complicated
- We do not share edge fluxes across same-level boundaries
- Added new prolongation/restriction ops for new non-CC fields
- Metadata::CellMemAligned is set by default for fluxes
- Flux divergence updates do not exist (in parts) for non-CC fields
- Boundary conditions need work with non-CC fields, particularly with linear solvers
- Need to put the macro type-based stuff in parthenon, accessible to downstream codes
- MG boundary conditions are WIP... e.g., multipole BCs for e.g., self-gravity may not be immediately straightforward
- Filling memory space that doesn’t exist for 3-valent node positions seems a bit not well-posed… what is the right decision?
- In 3D, allowing 3-valent, 4-valent, and 5-valent edges leads to a zoo of craziness
- OpenPMD is the standard; ADIOS2 is the framework (supporting different file types)
- Permits derived variables (e.g., not actually store, but computed from e.g., a “cons” vector)
- No XDMF required anymore with OpenPMD/ADIOS2
- GPU-aware IO (no channeling through Host)
- Histograms in Ascent were so slow that they were manually added to Parthenon
-
Singularity Ecosystem
- How do we prevent having like four different copies of kokkos, ports-of-call, etc. among different submodules
- Multigroup opacities (i.e., accesses passing a group index) are not available yet
- Interest in more complicated mixture models
- Interest in rate equations… does it belong in singularity ecosystem?
- Submodule vs fetch: Miller favors submodule; Grete asks if Jonah’s concerns over cmake fetch are alleviated by shipping our own tar files
- Logs are slow… transcendental functions are a performant path forward (3x on intel?)
- On ARM systems, transcendental functions for logs are not performant (because there is dedicated hardware for log)
- small meshblock performance
- big meshblock performance/hierarchical parallelism
- Params/Pinput/Auto-capturing inputs for documentation
- Number of packs per rank
- Synchronize coordinates implementations
- Expose type-based variable macros
- Clean up examples and regression tests
- Improve documentation, and/or make a tutorial
- Pack interfaces
- Upstreaming features:
- Update utilities for CT
- High-order utilities
- Maybe some discussion to be had about curating downstream reconstructions
- settling on an interface
- Adam to walk us through how subpacks work at next parthenon sync
- Ben to walk us through method in KHARMA at an upcoming sync
- where does this all go in repository?
- Separate concerns a bit more:
- think about a name for what this might be
- shared packages?
- should also refactor/re-organize interface folder
- Separate concerns a bit more:
- Additional indexing utilities
- Clean up type-based variable naming to better intermix sparse and tensor indices
- RKL2/IMEX
- CI:
- more automated checks
- downstream code tests
- performance benchmarking
- ParameterInput auto doc building
- clang tidy
- Warning free compilation
- Are we happy with our current IO and vis toolchain?
- how much more could/should parthenon own in vis world
- Ray tracing
- slices
- Code-sharing database
- Timestep related stuff
- Two dts -> 1 dt
- Different packages report dt automatically
- Report which package is controlling the timestep (Sam)
- Report cell that controls the timestep
- Trial and restart timestep
- Boundary conditions
- shearing boxes
- forest boundaries
- Combine reductions across reduction calls
- More realistic benchmarking
- More benchmarks in CI like our real codes
- Parthenon-VIBE needs to be updated
- User-defined domain decomposition
- Single/mixed precision
- Frankenblocks
- Autotuning of kernels
- Additional data types for fields?
- refactor which things are templated on underlying variable type
- Downstream help
- Unified par dispatch (Forrest + Adam R.)
- Embedded performance monitoring/timeouts
- OpenPMD/Adios2 (Philipp, Patrick)
- Update Partheon Hydro (Carolyn)
- Swarm AMR (Oleksii)
- Time-dependent boundary conditions in a backwards compatible way (Brandon, Adam R., Ben P.)
- Type-based scratch (Luke, Adam R., Ben P.)
- Remeshing on load (Jonah, Ben P.)
- Two meshes
- Load and interpolate arbitrary data
- Automatically capture/document Parameter Input (Jonah, Adam R.)
- Not much progress
- Coalesced comms
- Load balancing (Josh, Forrest)
- Heuristics (Jonah?)
- PCA
- AI for load balancing
- Heuristics (Jonah?)
- 1 Tensor timestep on one block
- Mesh-level data (Philipp)
- Markdown in sphincs docs (Jonah, Ben)
- Problem generators refactor (Jonah)
- Remove legacy packs (Luke, Jonah, Downstreams)
- PackDescriptor caching MR (Adam)
- Phoebus removing them
- Philipp to test
- Update at next sync
- Inputs
- TOML (Brandon?)
- Compilable input files (Adam D.)
- Spherical-polar meshblock connectivity/transmitting boundaries (Adam D., Josh, Luke, Ben)
- Coordinates caching (Jonah, Adam D., Luke, Josh)
- Mesh initialization (Jonah, Adam D., Luke, Josh)
- HPSF
- Broadly we're all interested
- Someone needs to help organize the administrative pieces (Philipp)
- github PR template with questionnaire. Philipp to investigate and then report back at next sync
- Code of conduct
- Governance plan
- Multiple levels to enter
- big leagues
- small, single person/institution
- in between
- github PR template with questionnaire. Philipp to investigate and then report back at next sync
- Someone at LANL needs to figure out the legal questions from our side. Josh to email Feynmann center.
- Tensors
- SPARC team on the hook
- Interest from broader Parthenon community
- Lots of uncertainty
- Unlikely to be intrusive to the rest of the code
- Not sure what tooling necessary/available yet
- Boba (form LLNL)
- Memory pools/Umpire?
- Tensorflow?
- How general do we want to be?
- Boltzmann only?
- Quantum many body?
- Core that pulls all of space
- Architectures?
- Only trains?
- Hierarchical tucker?
- Sophisticated Coordinates. Supports:
- 2D Cubed sphere (Josh, Adam D., Jonah, Luke)
- 2D box mesh
- simple Mesh reader
- 3D forest neighbor finding. Supports:
- 3D box mesh
- more exciting 3d meshes
- We gotta get away from templating on Real
- C++20
- Concepts don't seem to work with nvidia compilers?
- Adam D says nvcc does support in limited way?
- May be an opportunity to remove concepts lite when they work
- Kokkos will soon demand C++20. Summer 2025. Pretty imminent.
- Concepts don't seem to work with nvidia compilers?
- Julia bindings?
- Local timestepping
- Every other Thursday, 3pm ET
In no particular order with no particular priority.
Extracted from past meeting notes, discussion, last dev meeting agenda, issues, etc.
Please add/edit as needed (also during the week).
- State of the union
- Next "big" features/improvements
-
Params<->pinput<-> plus info from where parameters came from - Upstream support for non-cart coordinates (and caching coord data)
- Cleanup example directory
- need to update/document examples, benchmarks
- move some examples to regression tests
- highlight features of each examples
- Update parthenon-hydro
- Enroll/reporting various timestep constraints (per package) -> vector
- Add capability to redo a timestep (might need to move calculation of dt out of
Step()) - Consistency of calculating
dtfor AMR sims - Mesh refinement strategies (other than 2x2x2) <-> Frankenblocks
- CMake presets instead of machine files (or use shell scripts) or keep as is
- Report/estimate memory usage requirements
- Particles load balancing/AMR
- Autotuning of kernels #1200
- evaluate different abstraction to handle (hierarchical) parallelism that works for small and large number of blocks
- New par dispatch machinery
- Code manifest #1227
- Derived/virtual field registry #1228
- standardized format for input files
- Un
Realfields -
Meshvariables - buffer communications (performance, packing, ...)
- performance regression tests
- integrated performance monitoring #1180
- get rid or warnings
- single/mixed precision
- Tensors/linear algebra
- load balancing/timers
- upstream support for multi-patch
- remove legacy packs/maybe wrap sparse packs
- Threading for tasks
- in situ vis
- adaptive timestepping
- ...
-
- Open issues and PRs
- Documentation
- Overall reorg/cleanup
- Tutorial/intro
- Best pratices/performance hints
- FAQ
- Administrative
- future meeting time
- Github cleanup (issues, PRs, branches)
- Logo
- Update of outreach/impact here