Proceedings 2025 ESPResSo meetings - espressomd/espresso GitHub Wiki
- particle slice access went from 35.4 ms/loop to 4.7 ms/loop for 1000 particles
- this is now relatively fast enough for a tutorial
- further performance improvements will need to be done at the C++ level
- FFT on GPU achieved with heFFTe
- double precision has a factor 2.5 slow-down compared to single-precision
- GPU implementation is about 5 times faster than the CPU implementation
- sometimes an exception in the core cannot be safely propagated to the Python interface without leaving the ESPResSo system in an invalid state
- such exceptions are queued in the core and then output to stderr at the Python level, with only the first message being transformed into a Python exception
- this leads to poor user experience, since fixing the Python exception is only the first step: one also has to fix all error messages that were printed to stderr, otherwise even more cryptic errors will appear at the next integration step
- when this situation happens, it is almost always necessary to restart the simulation in a fresh Python session
- an ExceptionGroup is the pythonic way of handling such situations
- the number of errors printed to the terminal depends on an implementation-defined constant
- the associated clause is except*
- novice Python users might not know about this clause, but they are also not supposed to attempt to fix such a runtime error and should restart the simulation
- the only individuals who should attempt to fix such an exception group are developers of the feature that triggers the runtime error (e.g. in a unit test)
-
unittest
doesn't natively supportExceptionGroup
at the time of writing (see feature request python/cpython#137311)
old behavior
$ ./pypresso mwe.py
ERROR: Particle 65 moved more than one local box length in one timestep
ERROR: Particle 64 moved more than one local box length in one timestep
ERROR: Particle 44 moved more than one local box length in one timestep
ERROR: Particle 35 moved more than one local box length in one timestep
ERROR: Particle 85 moved more than one local box length in one timestep
ERROR: Particle 17 moved more than one local box length in one timestep
ERROR: Particle 12 moved more than one local box length in one timestep
Traceback (most recent call last):
File "mwe.py", line 53, in <module>
system.thermostat.set_npt(kT=1.0, gamma0=0.1, gammav=1e-3, seed=42)
File "script_interface.pyx", line 488, in espressomd.script_interface.ScriptInterfaceHelper.generate_caller.template_method
File "script_interface.pyx", line 179, in espressomd.script_interface.PScriptInterface.call_method
File "utils.pyx", line 213, in espressomd.utils.handle_errors
Exception: while calling method set_npt(): ERROR: your choice of piston=0.0001,
dt=0.01, p_epsilon=-1.66385 just caused the volume to become negative, decrease dt
Here fixing the box length is not sufficient: one also has to delete all particles, because their image box has become undefined. However, one cannot resize a box as long as particles are present, and one cannot delete particles when the box size is not positive. This is a soft lock and expert knowledge of the cell structure is required to safely resolve this error.
proposed new behavior
$ ./pypresso mwe.py
+ Exception Group Traceback (most recent call last):
| File "mwe.py", line 53, in <module>
| system.thermostat.set_npt(kT=1.0, gamma0=0.1, gammav=1e-3, seed=42)
| File "script_interface.pyx", line 488, in espressomd.script_interface.ScriptInterfaceHelper.generate_caller.template_method
| File "script_interface.pyx", line 179, in espressomd.script_interface.PScriptInterface.call_method
| File "utils.pyx", line 226, in espressomd.utils.handle_errors
| ExceptionGroup: Raised while calling method set_npt() (101 sub-exceptions)
+-+---------------- 1 ----------------
| RuntimeError: your choice of piston=0.0001, dt=0.01, p_epsilon=-1.66385
| just caused the volume to become negative, decrease dt in function void
| velocity_verlet_npt_propagate_AVOVA_And(const ParticleRangeNPT&,
| const IsotropicNptThermostat&, double, System::System&)
| (/home/user/espresso/src/core/integrators/velocity_verlet_npt_Andersen.cpp:109)
+---------------- 2 ----------------
| RuntimeError: Particle 8 moved more than one local box length in one timestep
| in function virtual void RegularDecomposition::resort(bool,
| std::vector<std::variant<RemovedParticle, ModifiedList> >&)
| (/home/user/espresso/src/core/cell_system/RegularDecomposition.cpp:231)
+---------------- ... ----------------
| and 99 more exceptions
+------------------------------------
Here we can see which C++ functions triggered the exception, with line number information. This is incredibly useful to developers. Regular users should save their work and terminate the Python session.
Exception groups
Exception groups are handled with this syntax:
try:
raise ExceptionGroup("there were problems", [OSError("error 1"), SystemError("error 2")])
except* OSError as e:
print("There were OSErrors")
except* SystemError as e:
print("There were SystemErrors")
or with this syntax (not a good practice):
try:
raise ExceptionGroup("there were problems", [OSError("error 1"), SystemError("error 2")])
except Exception:
print("skipping all exceptions")
- Verlet list is fully parallel
- force calculation scales reasonably well
- conversion from AoS to SoA currently doesn't scale well, needs to be investigated further
- zero-centered LB
- background fluid density is normalized to 1
- LB population field now stores deviations from the equilibrium population
- improves numerical accuracy in single-precision kernels
- combined stream-collide kernels
- single-step streaming and collide
- 5% speed-up on CPU, 20% speed-up on GPU
- LB populations now refer to post-collide populations
- MPI+OpenMP LB
- OpenMP scales better than pure MPI up to 4 threads, beyond that MPI is more favorable
- some parts of the LB code that are MPI-parallel are OpenMP-serial, needs further work
- MPI: tasks share data by serializing/communicating/deserializing buffers
- SMP: tasks can access each other's data (both read and write)
- care must be taken to avoid race conditions
- use abstraction layers (Kokkos, Cabana)
- single-node calculations use OpenMP, multiple-nodes calculation use MPI+OpenMP
- for small parallel jobs, users will most often use OpenMP instead of MPI
- MSD accumulator seems to yields incorrect values after 1 million time steps
- LB GPU implementation doesn't yet support more than 1024 grid point in any direction (walberla/walberla#255)
- EK flux boundary conditions are not stable in the ghost layer
- Heffte re-implementation of the P3M algorithm now works on 1 MPI rank
- current NpT implementation has convergence issues
- current NpT implementation yields incorrect compressibility for a WCA gas
- Nosé–Hoover implementation yields correct compressibility for a WCA gas
- leverage shared-memory parallelization with Cabana data structures
- store particle cells as a struct of arrays (SoA) instead of the currently implemented array of structs (AoS)
- Cabana leverages Kokkos to run algorithms with various backends, such as OpenMP, CUDA, SYCL, HPX
- Diffoscope packager workflow articles on Fedora Magazine and Fedora Project
- diamond lattice builder currently being implemented for pyMBE
- offers finer control over network topology, functionalization and coarse-graining workflows