ROMS Parallel IO - myroms/roms GitHub Wiki
Introduction
Generally, writing is a more frequent and complicated operation than reading. There are four strategies for writing:
- Single file, single writer: Serial I/O in non-parallel or parallel applications. It is the default strategy in ROMS using the NetCDF3 or NetCDF4 libraries.
- Single file, multiple writers: Parallel I/O allows each partition tile (task) to write its data into a single file. This capability is achieved in ROMS by activating
PARALLEL_IO
andHDF5
. It is only possible with NetCDF4/HDF5 libraries. However, the performance is questionable because option (1) is faster in some applications. - Single file, collective writers: Parallel I/O with either one or a subset of process performing I/O operations. It can be synchronous or asynchronous. In ROMS, this capability uses the Paralle-IO (PIO) library developed at NCAR. It is available when the
PIO_LIB
CPP option is activated. - Multiple files, multiple writers: Parallel I/O in which each distributed-memory or shared-memory tile decomposition writes its data into separate NetCDF files. However, post-processing is required to pack the data into a single file. This feature is undesirable in applications running on hundreds of processors. This capability is unavailable in ROMS because it is inefficient and complicates ensemble and 4D-Var drivers.
PIO Library
The PIO library has two modes of parallel I/O, which are only possible with the MPI applications.
-
Synchronous: MPI intra-communication mode. A subset of processes or all processors perform both I/O and computations. The users specify the number of I/O tasks,
PIO_IOTASKS
parameter, and how they are distributed across HPC nodes as a function of the ROMS MPI-communicato object, OCN_COMM_WORLD. It is often desirable to shift the first I/O tasks using the PIO_BASE parameter from the first computational task, since it has higher memory requirements than other processes. If the MPI processes are scattered across several computer nodes, it is highly recommended to spread all I/O tasks evenly across all nodes using thePIO_STRIDE
parameter. Avoid all I/O processes occupying the same node. This strategy is illustrated below.- In the
Box Rearrangement
, data is continuously rearranged from computational to I/O processes according to the data ordering in the file. Since data ordering between computational and I/O partitions may differ, the rearrangement will require all-to-all MPI communications. Each computing tile may transfer data to one or more I/O processes. - In the
Subset Rearrangement
, each I/O process is associated with a subset of computing processes. The computing tile sends its data to a unique I/O process. The data on all I/O processes may be more fragmented than the ordering on disk, which may increase the MPI communication to the storage medium. However, this method scales better since all-to-all MPI communications are unnecessary.
- In the
- Asynchronous: MPI inter-communications mode. The I/O tasks are a disjoint set of dedicated I/O processes that do not perform computations. It is possible to have groups of computational units running separate models, like coupling, where all the I/O data is sent to dedicated processes. In ROMS, this I/O mode is activated with
ASYNCHRONOUS_PIO
andDISJOINTED
communicators. It's not very easy and requires further work.
ROMS PIO Configuration and Implementation
The PIO
configuration for a particular application is set in the ROMS standard input file roms.in
. It depends on the application and the computational resources. The user needs to experiment with these parameters to evaluate the performance.
Notice that the standard NetCDF3/NetCDF4 and PIO libraries coexist in a ROMS executable. We can choose which library is used during reading (INP_LIB
) and writing (OUT_LIB
). The ROMS design is very flexible, as shown below, depending, for example, on the value of HIS(ng)%IOtype
in the derived TYPE T_IO structure for each ROMS input or output file, which can have a value of either io_nf90 or io_pio.
I/O Descriptors
To accelerate the reading and writing with PIO, ROMS declares and initializes the I/O descriptors once at the beginning of the computations. They are used in the parallel decomposition mapping from computational to I/O processes and vice versa for all ROMS C-type variables, array ranks, and array kinds. It specifies how data in memory should be written to or read from disk. The I/O descriptors are declared in mod_pio_netcdf.F
and initialized in module set_pio.F
, routine set_iodecomp
.