Design considerations - fabm-model/fabm GitHub Wiki

Design considerations

The primary aim of FABM is to provide consistent, complete and future-proof programming interfaces to which hydrodynamic and biogeochemical models can attach. The interfaces are designed to place minimal constraints on the structure of either type of model. Effectively, FABM serves as low-level coupler that enables the minimum of information exchange required for hydrodynamic-biogeochemical coupling; more elaborate frameworks that specify in detail how biogeochemical models should be structured could be built on top. The coupling layer that connects models is designed to remain thin: to preserve performance, information is passed between models with no or minimal processing by the coupler. These principles underlie the design of FABM. More specific considerations that have guided its implementation are discussed in the following sections.

Programming language

The majority of present-day hydrodynamic models is written in Fortran. A generic physical-biogeochemical coupler must therefore be able to interface with Fortran code. While this does not require the coupler to be written in Fortran as well, doing so avoids the problems of mixed-language solutions (e.g., Fortran-C++), which tend to involve complex and potentially computationally expensive code for inter-language communication, and which often have stringent requirements with respect to software environments (for instance, they may require specific compiler combinations, or run only on specific platforms).

Moreover, a pure Fortran coupler can make optimal use of all functionality that the language has to offer, including object-oriented (OO) features introduced in Fortran 2003, whereas a mixed language solution is often forced to fall-back to the subset of functionality supported by all languages used. For instance, custom solutions that provide access to Fortran from other programming languages (e.g., f2py for Python, cfortran.h for C and C++) often do not support Fortran objects (“derived types”). Even the standardized C interoperability layer introduced in Fortran 2003 does not support access to objects that inherit from (“extend”) some base type. As a result, mixed language solutions must either avoid many useful OO features of Fortran 2003, or develop an intermediate software layer in Fortran that converts OO constructs to non-OO constructs. These workarounds can negatively affect code quality (notably modularity) and performance. To avoid these issues, FABM is designed as a pure Fortran solution. It thus supports all platforms that the hosting hydrodynamic model runs on, provided support for Fortran 2003 is available. This includes Linux, Windows and Max OS X.

Object-oriented programming features introduced in the 2003 update to the Fortran standard facilitate the development of a coupling framework. In particular, support for objects (“derived types”) that combine both data and functionality (“type-bound procedures”) now makes it possible to create isolated, self-contained biogeochemical modules that communicate without a global software component having to be aware of the complete suite of biogeochemical models. We have found that most modern Fortran compilers (e.g, gfortran 4.7, Intel Fortran Compiler 12.1, Cray Fortran 8.1.9) support the subset of Fortran 2003 that is needed for object-oriented programming.

Fortran 2003 does not provide all functionality that the coupling framework requires. In particular, the framework is designed to be independent of the dimensionality of the spatial domain: it should as easily represent a 0D well-mixed box, as a 1D water column, 2D depth-integrated basin, or full 3D vertically-structured basin. Thus, the number of dimensions (rank) of arrays with biogeochemical variables varies between 0 and 3, depending on the physical host that FABM is embedded in. This is not supported in Fortran 2003: arrays must have a known number of dimensions. To overcome this limitation, we represent domain-dependent features (e.g., dimensionality of spatially explicit arrays, indices of spatial dimensions, loops over the spatial domain) with preprocessor macros, supported by all modern Fortran compilers. Hydrodynamic models set a small number of preprocessor macros at compile time to control the dimensionality of the spatial domain. This allows the preprocessor to replace macros in biogeochemical model code with domain-specific Fortran constructs, appropriate for the hydrodynamic host. Thus, biogeochemical models only use FABM-provided preprocessor macros within their code; they do not communicate with the preprocessor by setting macros themselves.

Disentangling hydrodynamics and biogeochemistry

A key role of the framework is to partition the functionality of large coupled physical-biogeochemical models into isolated, self-contained modules that interact through the coupler. The first step towards this is the separation of hydrodynamics and biogeochemistry, with FABM nested in between. This is visualized in Figure 1. By having the hydrodynamic and biogeochemical models communicate exclusively through FABM, it becomes possible to swap hydrodynamic models as well as biogeochemical models without affecting other parts of the coupled model. Moreover, the framework shields biogeochemical models from details of the spatial domain: it converts between spatially-explicit, on-grid representations of variables in the hydrodynamic model to local descriptions in the biogeochemistry layer. Biogeochemical models receive local variable values (e.g., local temperature, light, and biogeochemistry expressed as local concentration) and return local sink and source terms – they need not be aware of their physical location. As a result, it is possible to move from a 0D well-mixed box, via a 1D water column, to a 3D basin, while leaving the source code and configuration of biogeochemistry completely unchanged.

FABM specifies what roles the hydrodynamic and biogeochemical models fulfil with respect to the management of biogeochemical variables. In particular, the coupled advection-diffusion-reaction equation that governs the behaviour of biogeochemical tracers is conceptually split into the reaction part (i.e., sink and source terms) provided by biogeochemical models, and “everything else”, to be handled by the hydrodynamic model. “Everything else” includes transport (advection, diffusion) and residual vertical movement (sinking or floating), as well as any parameterizations of unresolved physical processes (e.g., eddy-induced mixing), dilution by freshwater input (precipitation, rivers), and concentration by evaporation. Effectively, the advection-diffusion-reaction equation is solved (time-integrated) by the hydrodynamic model, with FABM providing the reaction terms that it in turn obtains from active biogeochemical models.

The design proposed in Figure 1 is not unique to FABM: it also appears in custom coupling layers of specific physical-biogeochemical models, such as the original “bio” API of GOTM (Burchard et al., 2006) and the “generic tracer” package of MOM. Nevertheless, some aspects of the task division, such as allowing biogeochemical models to outsource the application of residual vertical transport (sinking/floating) to the physical model are not yet ubiquitous in hydrodynamic models.

In addition to transporting biogeochemical tracers and time-integrating their sink and source terms, hydrodynamic models are responsible for handling data input and output. With respect to input, they ideally allow the user to provide arbitrary data fields at run time, which may then be used to force biogeochemistry through FABM. With respect to output, the hydrodynamic model should offer a mechanism to include the biogeochemical tracers defined by FABM in its output, along with any non-transported diagnostic variables defined by biogeochemical models.

In short, the emphasis in FABM lies on interfaces for communication between hydrodynamics and biogeochemistry, not on the provision of numerical schemes or input/output logic. This choice was made to maintain a lean framework that does not duplicate functionality that is widely available in hydrodynamic models. There is an additional benefit to using the functionality of hydrodynamic models wherever possible: it ensures that biogeochemical tracers are treated in the same manner as physical tracers (e.g., temperature, salinity), which is essential for consistent simulations.

What is represented?

Spatial domain

Coupled physical-biogeochemical models focus on the behaviour of water parcels and the variables (tracers) that these contain. However, biogeochemistry within the water column is often influenced by exchange across its boundaries. For instance, dissolved gases are exchanged across the air-water interface, and nutrients are exchanged across the water-sediment interface. In some cases, exchange is mediated by biogeochemistry at these interfaces: variables that are part the bottom (e.g., benthic communities) or surface (e.g., algal mats, microlayer constituents). In the context of hydrodynamic models, surface- and bottom-attached variables are outside the water column, as they are not affected by water movement; nevertheless, they can have a significant impact on in-column biogeochemistry.

Accordingly, FABM distinguishes three domains: the pelagic, the water surface and the bottom. The real-world pelagic is viewed as 3D vertically structured environment, whereas the bottom and surface are viewed as 2D, horizontal-only slices (this does not preclude the modelling of vertically structured benthic communities, as discussed later). Biogeochemical state variables can be associated with any one of these domains; only those associated with the pelagic are transported. FABM provide separate interfaces to retrieve process rates (sink and source terms, surface exchange rates) for the pelagic, bottom and surface; this enables the hydrodynamic host to retrieve boundary fluxes associated with the pelagic on demand, e.g., for use as boundary condition in advection-diffusion schemes.

Model variables

In models, biogeochemistry is generally described by a set of state variables or “prognostic variables”, their dynamics governed by coupled advection-diffusion-reaction equations. State variable values are initialized at the start of the simulation and evolved in time by integrating their governing equations. Accordingly, FABM allows biogeochemical models to register any number of state variables, for which sink and source terms must be provided on demand. Time integration and transport is handled by the hydrodynamic host.

In addition to state variables, FABM supports diagnostic variables: quantities that can be calculated at any time from the biogeochemical state and environmental conditions. Hydrodynamic models are expected to include diagnostic variables directly in their output, perhaps after time-averaging or time-integrating their value across the model time step.

FABM further allows biogeochemical models to contribute to aggregate quantities, shared across all active biogeochemistry. For instance, models can let one or more of their variables contribute to total chlorophyll, total primary production or total carbon. This mechanism is also used to keep track of conserved quantities, e.g., totals of energy (J m-3) or specific chemical elements (mol m-3). For these conserved aggregate quantities, the host can compute integrals across the spatial domain, which permits the user of the coupled model to check energy and mass balances.

Sink and source terms for state variables often do not depend only on the local value of biogeochemical variables, but also on environmental variables such as temperature, salinity, pressure, or pH. These variables may be part of the hydrodynamic model or of another active biogeochemical model (e.g., pH may be provided by a module describing the carbonate system). FABM provide mechanisms to pass these data between models: Biogeochemical models simply register any dependencies during initialization, and the framework guarantees that the required variables will be available whenever it requests sink and source terms. To fulfil registered dependencies, the framework searches its global variable registry, which combines fields provided by the hydrodynamic model and variables registered by all active biogeochemical models. Dependencies that cannot be fulfilled are reported to the hydrodynamic model, which should then enable the user to provide the needed data as separate forcing fields during the simulation.

Information exchanged between hydrodynamics and biogeochemistry

FABM enables biogeochemical models to pass information other than sink and source terms to the hydrodynamic host. These include the rate of vertical movement of biogeochemical state variables (e.g., floating or sinking), which the hydrodynamic model should translate into a residual vertical advection term and solve. Furthermore, FABM supports different types of feedbacks to physics, including light absorption (resulting in heat production), and changes to surface albedo and wind drag (Sonntag and Hense, 2011).

Not all hydrodynamic models may implement all functionality that FABM supports. Feedbacks to temperature, albedo and wind drag can be difficult to implement. Furthermore, models may not support separate time-integration of surface and bottom fields, or the reading of arbitrary biogeochemical forcing fields during simulation. Initial omission of this functionality is deemed acceptable, provided the hydrodynamic supports the core functionality of FABM: time-integration and transport of biogeochemical state variables in the pelagic. This is sufficient for the majority of biogeochemical models.

Coding biogeochemical models

FABM offers a comprehensive set of interfaces through which biogeochemical models pass information about their variables and processes. These interfaces are exposed in object-oriented fashion: a biogeochemical model is coded as an object (“derived type”) which supports numerous methods (“type-bound procedures”), each responsible for providing specific information to FABM. These include methods for providing sink and source terms, surface and bottom fluxes, vertical movement rates (e.g., sinking, floating), light absorption coefficients, and feedbacks to wind drag and albedo. A key design criterion of FABM is to minimize the number of lines of code needed to create a complete biogeochemical model. For that reason, the use of nearly all interfaces is optional: models only need to implement methods for the functionality that they support. The sole exception is a model’s initialization routine, which must be implemented by every model to provide FABM with information on the model’s variables and parameters. The steps required to introduce a biogeochemical model in FABM are described in the section "Developing a new biogeochemical model".

Biogeochemical processes typically operate locally in space. This is reflected in process models: knowing local state variable values and local environmental conditions suffices to calculate local process rates. Thus, biogeochemical models are agnostic with respect to their spatial domain and its dimensionality (0D, 1D, 2D, 3D). FABM recognizes this and does not require models to manage spatially explicit fields: information on the spatial domain is passed implicitly through preprocessor macros, and loops over the spatial domain are defined in similar fashion. Thus, biogeochemical models in FABM describe local processes only: they retrieve the local state and environment and use these to compute local sink and source terms, local rates of vertical movement, etc.

Finally, FABM aims to facilitate the debugging of biogeochemical models. In particular, the framework has been designed such that common coding mistakes are either (a) prevented altogether (e.g., addressing spatially explicit arrays with invalid indices is not possible) or (b) guaranteed to be detected by the compiler, rather than triggering run-time crashes (e.g., attempting to change the value of read-only environmental variables triggers a compiler error, and so does addressing bottom fields as if they were pelagic, even in host models where bottom and pelagic fields have the same number of dimensions).

Coupling biogeochemical models

Biogeochemical models are coded as isolated, self-contained objects, joined at run-time by the user to construct a complete coupled biogeochemical model. Thus, far from all-inclusive monolithic codes, biogeochemical models in FABM are compact, self-contained modules that describe the behaviour of a single compound, process or organism. Complex description of biogeochemistry can thus be partitioned over numerous modules, as demonstrated by modular implementations of the Aquatic EcoDynamics (AED) model and the European Regional Seas Ecosystem Model (ERSEM) (Baretta et al., 1995). FABM provides the glue between all active biogeochemical models and presents the coupled result as a single biogeochemical system to the hydrodynamic host.

To couple biogeochemical models, they need a mechanism to share variables. For instance, a zooplankton model may need to obtain prey densities from a separate phytoplankton model. Such links between models are established in two steps: First, the zooplankton model registers “prey” as an external state variable, to be provided by some other model. This is defined in the code of the zooplankton model, and thus frozen at compile time. Second, this “prey” state variable is coupled at run time to a specific state variable of another model. An example of the coupling between isolated nutrient-phytoplankton-zooplankton-detritus models is shown in figure 2. FABM currently offers two mechanisms to make the final run-time coupling. The first is implicit: models can assign their variables an unambiguous identity (e.g., nitrate, phosphate, pH), taken from a master list defined by FABM. If multiple models register variables with the same identity, FABM couples these. The second coupling mechanism is explicit: the user can specify in FABM’s run-time configuration file that specific variables (identified by name) must be coupled.

FABM’s support for concurrent biogeochemical models can also be exploited for other purposes. By running several instances of the same biogeochemical model (or the same set of coupled models) in parallel with different parameterizations, it is possible to perform ensemble simulations or parameter sensitivity studies with a single simulation.

Run-time information exchange

FABM emphasizes run-time information exchange. On the one hand, this involves the run-time selection and configuration (e.g., provision of parameter values and initial state) of biogeochemical models and the coupling between them. On the other, it involves the models being able to complete describe themselves in terms of metadata, notably the names and attributes of the variables and parameters they contain.

FABM reads its run-time configuration from a single text file. This step is independent of the hydrodynamic host model that FABM is embedded in, which means that it is possible to transfer a complete biogeochemical model configuration from one hydrodynamic models to the next, simply by copying this one file. The format of the configuration file is based on a subset of the YAML standard, designed to store hierarchically structured data in human-readable form. The benefit of YAML over alternatives such as Fortran namelists, XML and JSON is that it is non-verbose (cf. XML), it can use indentation rather than hard-to-read nested braces or brackets (cf. JSON), it can be read and written by a large number of programming languages (cf. namelists), and does not rely on compile-time definition of all required inputs (cf. namelists). In short, the configuration file contains a section for each biogeochemical model instance that the user wants to activate. Each instance-specific section specifies the parameter values to use, the initial state variable values to use, and any couplings that need to be made with other biogeochemical models. By allowing complete run time configuration, it is possible to compile a hydrodynamic model once, after which the biogeochemical model structure and parameterization can be manipulated at run-time by changing the configuration file. Thus, designing and running the coupled biogeochemical model comes down to editing a text file, running the model executable, and viewing the output. It does not require software engineering expertise.

In addition to allowing complete configuration at run-time, FABM also allows the host to retrieve complete information on the biogeochemical model configuration. This information includes metadata such as names, units, and valid ranges of variables and parameters, as well as the actual and default values for parameters. Such data can be used by the host model. An obvious application is to add the variable metadata to the model output. However, more imaginative uses are possible: host models can enumerate parameters in order to present them to the user for further configuration, e.g., through a Graphical User Interface. Furthermore, the host can automatically select and reconfigure parameters in model calibration experiments and sensitivity studies. This would allow FABM models to be used in automated model test benches (Hemmings and Challenor, 2012).

Interfacing to different hydrodynamic models

Hydrodynamic models vary considerably in the way they store spatially explicit variables such as biogeochemical tracers, and they vary in the manner and order in which they process the terms that contribute to variable dynamics (e.g., transport, sinks and sources). A generic coupling framework can therefore only make few assumptions on the structure of the hydrodynamic model.

To allow for the variability in the way hydrodynamic models store tracers, FABM makes only one assumption: the values of a single variable across the full spatial domain are assumed to reside in one Fortran array, which allows them to be accessed with a single Fortran pointer. This requirement is met in all hydrodynamic models that we are aware of. It should be noted that it is not required that data for all variables combined are stored in a single array; values of different variables may be located in different arrays. Additionally, surface and bottom values of pelagic variables do not need to be addressable as a contiguous slice in a pelagic array. Thus, FABM can be used in models that position the bottom at a depth index that varies in horizontal space; this is the norm in “z coordinate” models. Finally, FABM allows part of the spatial domain (typically: land) to be masked; this area is automatically excluded during all biogeochemical computations. Through this mechanism, irregular spatial domains can be handled.

As FABM permits complete run-time configuration of the biogeochemical model, the number of biogeochemical state variables is not known at compile time. This places one further requirement on hydrodynamic models: they should not hard-code the number of biogeochemical tracers. Instead, memory for biogeochemical tracers should be allocated dynamically at run-time. This does not exclude the option of hard-coding the extents of the spatial domain, which is sometimes done to improve performance: by combining per-variable information in a Fortran derived type, the extents of spatially explicit fields can be defined at compile-time, while biogeochemical variables can still be added dynamically at run-time by creating multiple instances of the derived type. The steps needed to embed FABM within a hydrodynamic model are described in the section "Coupling FABM to a new physical_model".

Performance

FABM is designed as a light-weight framework. Most of its code is active at the start of the simulation to manage run-time configuration and coupling. During the simulation itself, information is passed between hydrodynamic and biogeochemical models with minimal overhead. To further optimize performance, copying of data between memory locations is avoided and subroutines are designed to process entire slices of the spatial domain at a time, rather than individual grid points.

By design, biogeochemical models in FABM do not create spatially-explicit arrays; even the framework itself does so sparingly (currently only to store diagnostic variables). Arrays are created and managed by the hydrodynamic host instead, and passed to FABM for biogeochemical models to operate upon. Persistent variable data (e.g., values of biogeochemical state variables) are passed in the form of Fortran pointers, and temporary data (e.g., arrays to hold the instantaneous change of state variables) are passed as assumed-shape arrays. This avoids any performance penalty associated with copying data between the hydrodynamic and biogeochemical models.

During simulation, the hydrodynamic model obtains information about biogeochemistry by calling subroutines provided by FABM. To minimize the overhead associated with these calls, FABM allows all subroutines to operate on a 1D slice of the spatial domain, rather than on individual grid points. This further enables the compiler to replace loops over the spatial domain by faster vectorized instructions. By setting preprocessor macros, the hydrodynamic model has full control over whether (a) slice-based operations are enabled (if not, each subroutine call processes a single grid point), and (b) which spatial dimension is vectorized. For instance, in the FABM coupling to the 1D General Ocean Turbulence Model (GOTM), the vertical dimension is vectorized, while for the 3D Modular Ocean Model the first horizontal dimension is vectorized; in FABM's 0D driver vectorization is not used at all.

Compile-time and run-time extensibility

By adopting an object-oriented approach to code biogeochemical models, they become reusable: new models can build upon earlier work by inheriting data and methods from existing models and adding new variables or functionality. For instance, a basic zooplankton model may be extended with the ability to perform vertical migration by inheriting from the original model type and overriding the method that provides vertical movement rates. As in other object-oriented programming languages (C++, Java), this works “out of the box”, without the base model code having been designed specifically to enable inheritance.

To illustrate inheritance-based extensibility, one could imagine an abstract model type for depth-structured sediment models. This type would provide methods for handling (registering, retrieving, setting) depth-structured variables, and internally map these to the unstructured bottom fields supported by FABM. The base type could further implement methods (e.g., numerical schemes) that perform vertical diffusion of tracers within the sediment column. By deriving from the abstract base type, depth-structured sediment biogeochemistry could be described with a minimum of code.

In addition to compile-time extensibility through type inheritance, FABM supports a run-time mechanism that allows model users to selectively add, remove or change model functionality. This mechanism exploits the fact that the framework represents all active biogeochemical models in a hierarchy, traversed by starting at the root and repeatedly drilling down to deeper levels. Models high up in the hierarchy function as gateway to models nested below. As a result, high-level models can override properties or functionality of deeper placed models. This makes it possible to create “filter models”, which do not describe a complete biogeochemical process, but position themselves between the root of the model hierarchy and a specific child model in order to override specific functionality. For instance, a generic filter could be written to disable or change surface fluxes of pelagic state variables. These filters can be activated and applied to specific biogeochemical models at run time, placing further control in the hands of end users.

A proof-of-concept of run-time extensibility is provided in the form of a “duplicator” module. This module positions itself below the root of the model tree and creates a number of copies of another biogeochemical model. These copies differ only in the value of a single parameter, which is drawn at random from a uniform distribution with user-specified bounds. Copies of the duplicated model run concurrently during simulation. If the duplicated model is representative of a specific species or functional type, this effectively creates a heterogeneous community. This enables Darwinian selection experiments such as proposed by Follows et al. (2007). A crucial feature is that both the model that is duplicated and the parameter that is manipulated are specified by name at run time. Thus, the user can take any species-specific model, and use it as the basis of a community of species. While this end result might be obtained through other mechanisms as well, e.g., through the introduction of a pre-processing step that manipulates the run-time model configuration, the ability of FABM to handle it completely within the framework is evidence of its extensibility support.

FABM is designed to provide the minimum of functionality needed for hydrodynamic and biogeochemical models to communicate. As such, it does not place demands of the conceptual structure of biogeochemical models. It also is agnostic with respect to the identity of biogeochemical variables (e.g., whether a variable represents nitrate, CO2, fish or other); if such identities are specified, they are passed as-is without FABM interpreting them. More elaborate frameworks are conceivable: one could imagine specifications that define a unified approach to coding biogeochemistry, e.g., by explicitly defining model currencies (e.g., a list of chemical elements that are to be tracked), or by providing templates for large numbers of species or functional types, and interfaces that pass information on specific biogeochemical processes. Such detail was intentionally left out of FABM in order to minimize restrictions on biogeochemical models. Nevertheless, support for object-oriented programming within FABM makes it easy to define more elaborate (and restrictive) frameworks, by defining templates in the form of abstract model types and methods, from which new models can inherit. We therefore view FABM as a low-level coupler on which more elaborate frameworks could be built.

Similarly, while FABM is written in Fortran to integrate optimally in Fortran-based hydrodynamic models, extensions could be developed to translate biogeochemical model specifications written in higher-level languages (Muetzelfeldt, 2004; Villa et al., 2009) into to FABM-specific Fortran. This approach could be used to enable more compact and intuitive model specifications.

In the most abstract sense, FABM describes the behaviour of state variables in a domain with undefined dimensionality. Space is not explicitly referred to, with the exception of the vertical dimension implied by the distinction of surface and bottom, and the presence of interfaces to specify vertical movement. In fact, nothing necessitates that FABM is used only for spatially contiguous aquatic environments. This is demonstrated by a proof-of-principle that uses the framework to describe biogeochemistry in vertically structured sediment columns, rather than water columns. A benefit of this approach is that it enables unified models of some biogeochemical processes, such as carbonate and redox chemistry, to be used in both pelagic and sediment. Similarly, the framework could be used to describe biogeochemistry within vertically structured model of sea ice. We also anticipate that FABM will be used to describe the dynamics of particles featured in individual-based or Lagrangian models. Here, the domain has one single “spatial” dimension that corresponds to the index of the particle.

⚠️ **GitHub.com Fallback** ⚠️