ModelScenario - openghg/openghg GitHub Wiki
For creating modelled measurement comparisons and as a basis for the atmospheric inversion setup (see: openghg_inversions repository), a ModelScenario object can be created to connect together related data.
This generally includes measurement data ($y$), footprint data, emissions data and boundary conditions data.
$y_{mod}=Hx$
where $y_{mod}$ is the modelled measurements, $x$ are the inputs which contribute to this measurement such as emissions and boundary conditions and $H$ is the sensitivity matrix which relates $x$ and $y_{mod}$.
- Measurement data is what we wish to compare $y_{mod}$ to and so we want time points which map to this.
- Footprint data generally contains details for $H$ for related to both the emissions (domain) and boundary condition domains (curtains).
- Emissions and Boundary conditions are components of $x$ which contribute to $y_{mod}$.
Collecting data
A ModelScenario
objects acts as a way to bring together associated data between the different data types. As described above these data types include:
surface
column
footprints
flux
boundary_conditions
Different combinations of these data types are needed to be able to produce the outputs we want. For instance, we would need footprints
and flux
data to be able to create a modelled mole fraction and footprints
and boundary_conditions
data t be able to produce a modelled baseline. We would often then want to align these to be directly comparable to surface
or column
data.
When defining a ModelScenario
object there are several ways to collect and attach data from the object stores:
- Pass appropriate keywords to
ModelScenario
when this is first created. This will then attempt to search using the appropriate keywords for each data type and attach the data found. Note: if there is any ambiguity for a data type this will not attach the data and will print a warning. - Search the object store manually and pass these data object directly when creating a ModelScenario instance
- After initialisation, a
ModelScenario
can be populated using theadd_*
methods with appropriate keywords or direct data objects.
See the analyse module level documentation string for more details on this including code snippets.
Creating modelled timeseries
Once the necessary data types have been linked to a ModelScenario
instance, model timeseries can be created using footprints with flux and boundary conditions data. This will create the modelled mole fraction and modelled baseline respectively. The main method to create these outputs is the footprint_data_merge
method but it is also possible to create the two outputs individually.
Aligning datasets
When creating modelled timeseries, one of the steps we need to take is to align the stored data across the different data types. To create $y_{mod}$, for instance, we need to start by making sure the footprint
component is representative of the obs
we want to compare this to. This can involve aligning the datasets and/or resampling the data to an appropriate frequency.
How this is done on dependent on different factors:
platform
of the data, as this makes a difference to the created footprintsresample_to
input
When considering platform
:
- "satellite" - this will not align or resample the data as it is expected that satellite footprints are produced on a per point basis
- "flask" - this will align the data (using "ffill") but not resample the data. Flask data is irregular and so we should not impose a regular resample frequency on this.
- any other value (or not specified) - this will continue with the resampling and alignment steps.
The resample_to
options are:
- "obs" - align and resample "footprint" to match "obs" data
- "footprint" - align and resample "obs" to match "footprint" data
- "coarsest" - examine the periods for each dataset and choose the one with the lower frequency (coarser resolution)
- specified frequency (e.g. "1h") - resample both sets of data to the specified frequency
To determine the period represented by each observation time point, the attributes are examined to look for those which indicate the averaged period or known sampling period:
- "averaged_period"
- "sampling_period"
- "sampling_period_estimate"
If these attributes are (a) not found, (b) have values indicating they are not set, this will attempt to infer the sampling period from the data frequency.
Current workflow:
Update this to include code or diagram rather than image
Plotting
Future plans
Within an inversion system we will often run with data from multiple measurement inputs. For this we need to consider how we could create a super object which can contain and link multiple ModelScenario objects. Within openghg_inversions
at the moment this is done using a dictionary syntax (akin to the original acrg repository set up) so a first step could be to mimic this setup before building on it.
Known interactions
- Within
openghg_inversions
, there is some possibility of duplication of some of the collection of data and the steps this undertakes. There is likely some useful functionality that could/should be incorporated back intoopenghg
.