ModelScenario - openghg/openghg GitHub Wiki

For creating modelled measurement comparisons and as a basis for the atmospheric inversion setup (see: openghg_inversions repository), a ModelScenario object can be created to connect together related data.

This generally includes measurement data ($y$), footprint data, emissions data and boundary conditions data.

$y_{mod}=Hx$

where $y_{mod}$ is the modelled measurements, $x$ are the inputs which contribute to this measurement such as emissions and boundary conditions and $H$ is the sensitivity matrix which relates $x$ and $y_{mod}$.

  • Measurement data is what we wish to compare $y_{mod}$ to and so we want time points which map to this.
  • Footprint data generally contains details for $H$ for related to both the emissions (domain) and boundary condition domains (curtains).
  • Emissions and Boundary conditions are components of $x$ which contribute to $y_{mod}$.

Collecting data

A ModelScenario objects acts as a way to bring together associated data between the different data types. As described above these data types include:

  • surface
  • column
  • footprints
  • flux
  • boundary_conditions

Different combinations of these data types are needed to be able to produce the outputs we want. For instance, we would need footprints and flux data to be able to create a modelled mole fraction and footprints and boundary_conditions data t be able to produce a modelled baseline. We would often then want to align these to be directly comparable to surface or column data.

When defining a ModelScenario object there are several ways to collect and attach data from the object stores:

  1. Pass appropriate keywords to ModelScenario when this is first created. This will then attempt to search using the appropriate keywords for each data type and attach the data found. Note: if there is any ambiguity for a data type this will not attach the data and will print a warning.
  2. Search the object store manually and pass these data object directly when creating a ModelScenario instance
  3. After initialisation, a ModelScenario can be populated using the add_* methods with appropriate keywords or direct data objects.

See the analyse module level documentation string for more details on this including code snippets.

Creating modelled timeseries

Once the necessary data types have been linked to a ModelScenario instance, model timeseries can be created using footprints with flux and boundary conditions data. This will create the modelled mole fraction and modelled baseline respectively. The main method to create these outputs is the footprint_data_merge method but it is also possible to create the two outputs individually.

Aligning datasets

When creating modelled timeseries, one of the steps we need to take is to align the stored data across the different data types. To create $y_{mod}$, for instance, we need to start by making sure the footprint component is representative of the obs we want to compare this to. This can involve aligning the datasets and/or resampling the data to an appropriate frequency.

How this is done on dependent on different factors:

  • platform of the data, as this makes a difference to the created footprints
  • resample_to input

When considering platform:

  • "satellite" - this will not align or resample the data as it is expected that satellite footprints are produced on a per point basis
  • "flask" - this will align the data (using "ffill") but not resample the data. Flask data is irregular and so we should not impose a regular resample frequency on this.
  • any other value (or not specified) - this will continue with the resampling and alignment steps.

The resample_to options are:

  • "obs" - align and resample "footprint" to match "obs" data
  • "footprint" - align and resample "obs" to match "footprint" data
  • "coarsest" - examine the periods for each dataset and choose the one with the lower frequency (coarser resolution)
  • specified frequency (e.g. "1h") - resample both sets of data to the specified frequency

To determine the period represented by each observation time point, the attributes are examined to look for those which indicate the averaged period or known sampling period:

  • "averaged_period"
  • "sampling_period"
  • "sampling_period_estimate"

If these attributes are (a) not found, (b) have values indicating they are not set, this will attempt to infer the sampling period from the data frequency.

Current workflow: image

Update this to include code or diagram rather than image

Plotting

Future plans

Within an inversion system we will often run with data from multiple measurement inputs. For this we need to consider how we could create a super object which can contain and link multiple ModelScenario objects. Within openghg_inversions at the moment this is done using a dictionary syntax (akin to the original acrg repository set up) so a first step could be to mimic this setup before building on it.

Known interactions

  • Within openghg_inversions, there is some possibility of duplication of some of the collection of data and the steps this undertakes. There is likely some useful functionality that could/should be incorporated back into openghg.