HISTORY Samplers - GEOS-ESM/MAPL GitHub Wiki
Introduction
An observing system simulation experiment (OSSE) is a modeling experiment used to evaluate the value of a new observing system when actual observational data are not available. An OSSE system includes a nature run (Atlas, 1997), a data assimilation system (Atlas et. al, 2015), and software to simulate “observations” from the nature run and to add realistic observation errors. OSSEs are designed to assess the impact of instruments that do not yet exist on numerical weather prediction (NWP) (Boukabara et. al, 2016) and analysis; to make design decisions for a new observing system or network; and to investigate the behavior of data assimilation systems and thereby optimally tune these systems in an environment where the “truth” and hence the system’s behavior is known.
Any OSSE activity starts with a realistic representation of nature, typically by means of a high-resolution simulation by a comprehensive Earth system model without assimilation, the so-called Nature Run (NR). These models are run for a period long enough to capture the relevant natural variability such as the seasonal cycle, and to spin up to a well equilibrated state. Any OSSE needs to have a procedure to extract synthetic observations that mimic the distribution of real observations, and the impacts of synthetic data should be equivalent to the corresponding impacts of real observations. The process of simulating the observations amount to sampling the NR at the appropriate times and locations.
NWP models used in OSSEs generate outputs across a grid system, essentially providing forecast information at specific points in space and time. If we want to obtain data at locations of interest (as seen by instruments), we can use offline techniques such as the Model Output Statistics (MOS) to statistically interpolate the model data to those locations. It is more attractive for models to have the capability to produce fields at any location and any frequency at runtime, instead of doing it offline.
In recent years, ESMF has incorporated robust parallel and scalable functionality for interpolation and regridding.
This has allowed the GEOS model to be able to perform these tasks (interpolation and regridding) on the fly during the model integration.
Initially, the GEOS model implemented the ability to read input data files and produce output files of different grid types and resolutions (horizontal and vertical).
This work was expended with Sampler
, a tool to generate data files at the user's prescribed locations (fixed or dynamic).
Sampler
is a HISTORY subcomponent that maps gridded model geophysical
variables onto observation locations, be it fixed ground stations, aircraft trajectories or satellite swath.
With Sampler
, we have the ability to configure the entire HISTORY pipeline to directly generate for any GEOS desired quantity at any static or time dependent location (or group of locations) of interest (stations, moving object trajectory, satellite swath, etc.).
In this document, we describe the different options for Sampler
and explain how to use each of them while running the GEOS model.
Types of samplers
Station sampler
Is used to produce geophysical variables at a set of time-independent geospatial coordinates corresponding to fixed ground stations (for instance NASA AERONET or NOAA GHCNd land surface stations).
Station sampler: list of stations
The user needs to create a csv file to list all the stations of interest. Each row should have at least the following information:
- station name
- station latitude
- station longitude
The user may specify other parameters (such as the station ID) to add more description of a station as long as all the lines have the same number of columns. Currently, the code supports files with any of the following line contents:
station_id, station_name, station_longitude, station_latitude
station_name, station_id, station_longitude, station_latitude
station_name, station_longitude, station_latitude
station_name, station_latitude, station_longitude
Remark: Since the most important parameters are the station name and its position, the source code will be refactored in the future so that the station file could include any number of columns as long as the key parameters are present in a consistent order.
Here is a sample station file:
List of stations from AERONET
name,lon,lat
Anchorage,-149.9,61.2
Atlanta,-84.4,33.7
Greenbelt,-76.9,39.1
Bismarck,-100.8,46.8
It obeys the line formatting:
station_name, station_longitude, station_latitude
HISTORY.rc
Station sampler: settings in The HIRTORY.rc
file settings for the station sampler follow the same
syntax as described in the MAPL History Component document.
However, there specific parameters are required to be able to exercise the station sampler:
- sampler_spec: A string that needs to be set to
'station'
. - station_id_file: Full path to the file containing the list of stations and their locations (latitude and longitude). _ station_skip_line: An integer specifying the numbers of lines to skip on top the station file.
- regrid_method: A string specifying the regridding method (for instance
'BILINEAR'
,'CONSERVATIVE'
) to be used to interpolate the model fields at the different stations.
COLLECTIONS:
Aeronet
::
Aeronet.sampler_spec: 'station'
Aeronet.station_id_file: FULL_PATH/my_station_file.csv
Aeronet.station_skip_line: 2
Aeronet.template: %y4%m2%d2_%h2%n2.nc4
Aeronet.format: 'CFIO'
Aeronet.frequency: 001000,
Aeronet.duration: 240000,
Aeronet.regrid_method: 'BILINEAR' ,
Aeronet.fields: 'var2', 'Root',
'var3', 'Root',
'GOCART::var2', 'Root',
Trajectory sampler
Is used to produce any geophysical variables at time-dependent geospatial specific points along a defined path or trajectory through the atmosphere (corresponding to tracks of aircraft, balloons, ships or nadir-viewing spaceborne assets). The goal is to provide a snapshot of atmospheric conditions as an object would experience them while moving through that path.
This component can either be configured with an explicit list of coordinates (!, ", t), or by specifying a Two-Line Element (TLE) file describing an orbit, with orbital propagators such as the Simplified General Perturbations-4 (SGP4, see Vallado et al., 2006) used online for generating a sequence of geospatial locations.
Swath sampler
Are used to produce geophysical at time-dependent geospatial coordinates corresponding to the two-dimensional swath of an orbiting instrument. Swaths are typically represented by logically rectangular curvilinear grids that may have higher or lower resolution than the NR. When the swath has lower resolution than the NR, conservative regridding will be performed. However, in cases when the observing system has a much higher resolution than the NR, it maybe more advantageous to use masked samplers and perform any necessary interpolation offline.
Masked sampler
Are used when the observing system has a much higher resolution than the NR. In this case, gridded geophysical variables are masked in such a way that values are preserved at those grid-points that have been visited by the satellite, with possibly the addition of a “halo” for aiding off-line interpolation, with all other grid-points receiving a constant undefined value. These gridded fields can be efficiently output using internal compression algorithms available with most modern formats (e.g., NetCDF-4, HDF-5), or alternatively using a sparse storage scheme.
References
- Atlas, R., 1997:
Atmospheric observations and experiments to assess their usefulness in data assimilation
. J. Meteor. Soc. Japan, 75, 111–130, https://doi.org/10.2151/jmsj1965.75.1B_111. - Atlas, R., L. Bucci, B. Annane, R. Hoffman, and S. Murillo, 2015:
Observing system simulation experiments to assess the potential impact of new observing systems on hurricane forecasting
. Mar. Technol. Soc. J., 49, 140–148, https://doi.org/10.4031/MTSJ.49.6.3. - Boukabara, S. A., and Coauthors, 2016:
Community Global Observing System Simulation Experiment (OSSE) Package (CGOP): Description and usage
. J. Atmos. Oceanic Technol., 33, 1759–1777, https://doi.org/10.1175/JTECH-D-16-0012.1.