HISTORY Samplers - GEOS-ESM/MAPL GitHub Wiki
The goal of MAPL History Sampler code is to output geophysical fields and simulated (virtual) observations on the fly with GEOS model runs, eliminating the need for post processing model output. Some of these output are real, highly accurate HofX computational results at model temporal resolutions, more advanced than results from conventional DA procedure. This time-dependent sampling functionality is needed by OSSE and is useful to both modeling and instrument community.
Types | Input obs platforms | Output geometry | Input file / example frequency | Campaign |
---|---|---|---|---|
Station Sampler | Aeronet, GHCND | static | text | |
Trajectory Sampler | Aircraft, Satellite | time-dependent | IODA / 6hr | CAMP2EX, FIREXAQ |
Swath Sampler | ATMS, Aqua, Terra | time-dependent | .nc4 / 6 min | |
Geostationary Sampler | ABI_geostationary | static | netCDF4 | |
Mask Sampler | GOES-R | static on the cubed sphere | netCDF4 |
Among the five types of samplers, the station sampler and the geostationary sampler accept one input file with static geometry, while the trajectory sampler and the swath sampler process time-dependent geometry and accept file name templates with wild character (e.g., SNDR.J1.ATMS.%y4%m2%d2T%h2%n2*.nc). The required setting for each sampler type is specified in HISTORY.rc as detailed below.
Is used to produce geophysical variables at a set of time-independent geospatial coordinates corresponding to fixed ground stations (for instance NASA AERONET and NOAA GHCNd land surface stations [ghcnd-stations.txt]).
The user needs to provide a csv file to list all the stations of interest. Each row should have at least the following information:
- station name
- station latitude
- station longitude
The user may specify other parameters (such as the station ID) to add more description of a station as long as all the lines have the same number of columns. Currently, the code supports files with any of the following line contents:
station_id, station_name, station_longitude, station_latitude
station_name, station_id, station_longitude, station_latitude
station_name, station_longitude, station_latitude
station_name, station_latitude, station_longitude
Note
Since the most important parameters are the station name and its position, the source code will be refactored in the future so that the station file could include any number of columns as long as the key parameters are present in a consistent order.
Here is a sample station file:
List of stations from AERONET
name,lon,lat
Anchorage,-149.9,61.2
Atlanta,-84.4,33.7
Greenbelt,-76.9,39.1
Bismarck,-100.8,46.8
It obeys the line formatting:
station_name, station_longitude, station_latitude
The HISTORY.rc
file settings for the station sampler follow the same
syntax as described in the MAPL History Component document.
However, specific parameters are required to be able to exercise the station sampler:
-
sampler_spec: A string that needs to be set to
'station'
to select a station sampler collection. - station_id_file: Full path to the file containing the list of stations and their locations (latitude and longitude in degrees).
- station_skip_line: An integer specifying the numbers of lines to skip on top the station file.
-
regrid_method: A string specifying the regridding method (for instance
'BILINEAR'
,'CONSERVATIVE'
) to be used to interpolate the model fields at the different stations.
Sample HISTORY.rc settings for a station sampler
COLLECTIONS:
Aeronet
::
Aeronet.sampler_spec: 'station'
Aeronet.station_id_file: FULL_PATH/my_station_file.csv
Aeronet.station_skip_line: 2
Aeronet.template: %y4%m2%d2_%h2%n2.nc4
Aeronet.format: 'CFIO'
Aeronet.frequency: 001000,
Aeronet.duration: 240000,
Aeronet.regrid_method: 'BILINEAR' ,
Aeronet.fields: 'PHIS' , 'AGCM' , 'phis' ,
'TROPT' , 'AGCM' ,
'TS' , 'SURFACE' , 'ts' ,
'TSOIL1' , 'SURFACE' ,
'PS' , 'DYN' , 'ps' ,
'Q' , 'MOIST' , 'sphu' ,
::
The trajectory sampler is used to produce geophysical variables at time-dependent geospatial specific points along a defined latitude-longitude-height path or trajectory (corresponding to tracks of aircraft, balloons, ships or nadir-viewing spaceborne assets). The goal is to provide a snapshot of atmospheric conditions as an object would experience them while moving through that path. This input trajectory class can include both conventional and satellite observations. The supported input file format is the JCSDA IODA file, whose metadata contains location index and arrays of [lon, lat, height, time] as a function of location index.
Two schema versons have been developed and implemented in MAPL for trajectory samplers. Schema version 1 is the default choice and the recommended version to users due to its convenient user interface and fast IO performance. An earlier version that utilizes fewer ESMF route handle for regridding purposes is named schema version 2, which can be still useful if memory usage become a bottleneck, possibly at about 1-km resolution. We focus on schema version 1. Take aircraft trajectory as an example, here is example input to create the trajectory samper collection.
schema.version: 1
is the recommended version for trajectory sampler with fast IO performance. An example input is shown below.
COLLECTIONS:
aircraft
airs_aqua
::
GRID_LABELS:
aircraft_GEOM
airs_aqua_GEOM
::
aircraft_GEOM.GRID_TYPE: trajectory
aircraft_GEOM.schema: IODA
aircraft_GEOM.file_name_template: /discover/nobackup/projects/gmao/aist-nr/data/ioda_reshuffle/%y4%m2%d2/geos_atmosphere/aircraft.%y4%m2%d2T%h2%n2%S2Z.nc4
airs_aqua_GEOM.GRID_TYPE: trajectory
airs_aqua_GEOM.schema: IODA
airs_aqua_GEOM.file_name_template: /discover/nobackup/projects/gmao/aist-nr/data/ioda_reshuffle/%y4%m2%d2/geos_atmosphere/airs_aqua.%y4%m2%d2T%h2%n2%S2Z.nc4
schema.version: 1 # 1 : use grid_label for traj sampler
Trajectory_Schema::
IODA.index: Location # name for index
IODA.lon: MetaData/longitude # lon
IODA.lat: MetaData/latitude # lat
IODA.time: MetaData/dateTime # time
IODA.reftime: 2019-07-31T21:00:00 # reference synoptic time
IODA.frequency: '000000 060000' # frequency of ioda files
::
aircraft.sampler_type: trajectory
aircraft.grid_label: aircraft_GEOM
aircraft.template: '%y4%m2%d2_%h2%n2z.nc4',
aircraft.format: 'CFIO',
aircraft.Epoch: 060000 # fixed integer format: hhmmss
aircraft.use_NWP_1_file: .true.
aircraft.restore_2_obs_vector: .true.
aircraft.regrid_method: 'BILINEAR' ,
aircraft.splitField: 1,
aircraft.fields:
'U' , 'DYN' , 'u' ,
::
airs_aqua.sampler_type: trajectory
airs_aqua.grid_label: airs_aqua_GEOM
airs_aqua.template: '%y4%m2%d2_%h2%n2z.nc4',
airs_aqua.format: 'CFIO',
airs_aqua.Epoch: 060000 # fixed integer format: hhmmss
airs_aqua.use_NWP_1_file: .true.
airs_aqua.restore_2_obs_vector: .true.
airs_aqua.regrid_method: 'BILINEAR' ,
airs_aqua.splitField: 1,
airs_aqua.fields:
'PHIS' , 'AGCM' , 'phis' ,
The input from above contains two collections aircraft
and airs_aqua
. The trajectory of aircraft
is defined on one meta-grid aircraft_GEOM
. This grid uses IODA
keyword defined in Trajectory_Schema
to specify the metadata contained in the file_name_template: β¦/aircraft.%y4%m2%d2T%h2%n2%S2Z.nc4
. The IODA
definition can be reused in other observation platform, since many JCSDA IODA files have the same metadata format.
A few kyewords in Trajectory_Schema are noteworthy:
-
index
: name for index in obs files -
lon
: name for longitude in obs files -
lat
: name for latitude in obs files -
time
: name for time in obs files -
reftime
:$T_0$ , a reference time for the set of obs files defined in file_name_template -
frequency
:$\Delta T$ , the smallest time interval between the time between two obs file names. Note that sampler will search for file names matching file_name_template, whose time stamp matches$T_0 + \Delta T * integer$ within the interval of Epoch time (see below)β¨
The aircraft
collection itself contains a few important keywords:
-
sampler_type
: choose trajectory -
grid_label
: use the above defined grid from observation files -
template
: same as in common HISTORY for output file -
format
: --- -
Epoch
: it equals to the frequency sampler uses to output interpolated fields, and it sets up a time interval where the set of observation locations that correspond to this interval is used to define an effective output grid -
use_NWP_1_file
: for the NWP case, instruct the sampler to choose only one obs file for each Epoch time (i.e., the first file that matchesT0 + deltaT * integer
) -
restore_2_obs_vector
: only specify whenuse_NWP_1_file = .true.
, iftrue
will instruct the sampler to generate output fields with observation locations that is restored to the sequence of obs points that appear in the input file. Otherwise, the output obs locations will be in the order of time sequence. -
splitField
: use 1 when the input field that contains an ungridded dimension needs to be split into multiple output fields. Example:'TOTEXTTAU' , 'GOCART2G' ,'TOTEXTTAU470;TOTEXTTAU550;TOTEXTTAU870',
schema.version 2
is a less used feature,
which concatenates and groups multiple observation platforms into a single collection, e.g., jedi.
Hence there is only one ESMF regridding route handle for multiple platforms, which reduces memory footprint.
A practical example for input HISTORY.rc file using two obs. platforms are shown below.
COLLECTIONS:
jedi
::
schema.version: 2 # 1 : use grid_label for traj sampler (default)
# 2 : use multiple platforms as a supercollection
Trajectory_Schema::
IODA.index: Location # name for index
IODA.lon: MetaData/longitude # lon
IODA.lat: MetaData/latitude # lat
IODA.time: MetaData/dateTime # time
IODA.obs_reftime: 2019-07-31T21:00:00 # reference synoptic time
IODA.obs_frequency: '000000 060000' # frequency of ioda files
::
jedi.sampler_type: trajectory
jedi.ObsPlatforms: aircraft airs_aqua
jedi.template: '%y4%m2%d2_%h2%n2z.nc4',
jedi.format: 'CFIO',
jedi.schema: IODA
jedi.Epoch: 060000 # fixed integer format: hhmmss
jedi.use_NWP_1_file: .true.
jedi.restore_2_obs_vector: .true.
jedi.regrid_method: 'BILINEAR' ,
jedi.splitField: 1,
::
DEFINE_OBS_PLATFORM::
PLATFORM.aircraft::
IODA_SCHEMA::
file_name_template: data/%y4%m2%d2/geos_atmosphere/aircraft.%y4%m2%d2T%h2%n2*Z.nc4
::
GEOVALS_SCHEMA::
fields::
'U', 'DYN', 'u' ,
::
::
::
PLATFORM.airs_aqua::
IODA_SCHEMA::
file_name_template: data/%y4%m2%d2/geos_atmosphere/airs_aqua.%y4%m2%d2T%h2%n2*Z.nc4
::
GEOVALS_SCHEMA::
fields::
'U', 'DYN', 'u' ,
::
::
::
Note that our jedi
collection contains two platforms: aircraft
and airs_aqua
, which are defined by PLATFORM.aircraft::
and PLATFORM.airs_aqua::
. Each of the platform defines its own input file_name_template
and output fields
within the GEOVALS_SCHEMA::
section. jedi.schema: IODA
is the keyword we use to define the common metadata contained in observation files.
The parameter Epoch
is set (with the format hhmmss
) in the collection
definition section of the HISTORY.rc file.
It determines the output frequency of the trajectory collection.
The value of Epoch
influences the experiments in many ways:
- The value of
Epoch
is used by the code to identify the number of observations files available between two consecutive trajectory output time periods. The code reads the files to collect all the locations (lat/lon) and times, and writes in the trajectory collection, fields at those locations/times.
Note
Unlike swath samper
, the trajectory sampler
will not crash if observation files from one or more platforms could not be found on the disk. These platforms will be skipped during the code run. This is needed because there are cases when a field campaign does not produce data on certain days. We choose not to disrupt the GCM run even if no sampler output is generated for trajectory. Users need to take caution to make sure observation files exist during certain time intervals to avoid consuming computing time without generating desired sampling output.
The swath sampler is used to produce geophysical at time-dependent geospatial coordinates corresponding to the two-dimensional swath of an orbiting instrument. Swaths are typically represented by logically rectangular curvilinear grids that may have higher or lower resolution than the NR. When the swath has lower resolution than the NR, conservative regridding will be performed. However, in cases when the observing system has a much higher resolution than the NR, it maybe more advantageous to use masked samplers and perform any necessary interpolation offline.
There are two groups of parameters in the HISTORY.rc file that need to
be properly set to exercise the swath sampler.
The first group is within the GRID_LABELS
definition of the swath and the second group is for the swath HISTORY collection.
GRID_LABELS
definition
-
GRID_TYPE: The grid type set here to
Swath
. - GRID_FILE: A file template providing the full path to the location of the observation swath file.
- index_name_lon: Name of the longitude dimension in the observation file.
- index_name_lat: Name of the latitude dimension in the observation file.
- var_name_lon: Name of the longitude array in the observation file.
- var_name_lat: Name of the latitude array in the observation file.
-
tunit: time unit in the format
seconds since YYYY-MM-DD hh:mm:ss
. -
obs_reftime: a reference date (
YYYY-MM-DDThh:mm:ss
) where an observation file exists -
obs_frequency: the date/time interval (
yymmdd hhmmss
) between two consecutive observation files. -
Epoch: The model output frequency in the format
hhmmss
. If not provided or set to000000
, the code will crash.
Swath collection
The settings are the same as in any standard HISTORY.rc collection. Two particular parameters are required and needs our attention:
-
sampler_type: A string that needs to be set to
'swath'
to select a swath sampler collection. -
frequency: the frequency of the swath output file in the format
hhmmss
. It should be exactly equal toEpoch
in the swath grid definition, otherwise the code will crash.
Here is an example of swath sampler `HISTORY.rc` file
COLLECTIONS:
SNDR_ATMS
::
GRID_LABELS:
SwathGrid_ATMS
::
SwathGrid_ATMS.GRID_TYPE: Swath
SwathGrid_ATMS.GRID_FILE: /discover/nobackup/projects/gmao/aist-nr/data/SNDR.J1.ATMS.L1B.v02/Y%y4/%D3/SNDR.J1.ATMS.%y4%m2%d2T%h2%n2*.nc
SwathGrid_ATMS.LM: 2
SwathGrid_ATMS.index_name_lon: xtrack
SwathGrid_ATMS.index_name_lat: atrack
SwathGrid_ATMS.var_name_lon: lon
SwathGrid_ATMS.var_name_lat: lat
SwathGrid_ATMS.var_name_time: obs_time_tai93
SwathGrid_ATMS.tunit: 'seconds since 1993-01-01 00:00:00'
SwathGrid_ATMS.obs_reftime: '2019-08-01T00:00:00'
SwathGrid_ATMS.obs_frequency: '000000 000600' # yymmdd hhmmss
SwathGrid_ATMS.Epoch: 060000 # hhmmss
SNDR_ATMS.sampler_type: 'swath'
SNDR_ATMS.template: '%y4%m2%d2_%h2%n2.nc4',
SNDR_ATMS.format: 'CFIO',
SNDR_ATMS.frequency: 010000
SNDR_ATMS.grid_label: SwathGrid_ATMS,
SNDR_ATMS.regrid_method: 'BILINEAR' ,
SNDR_ATMS.fields: 'PHIS' , 'AGCM' , 'phis' ,
'TROPT' , 'AGCM' ,
'TS' , 'SURFACE' , 'ts' ,
'TSOIL1' , 'SURFACE' ,
'PS' , 'DYN' , 'ps' ,
'Q' , 'MOIST' , 'sphu' ,
::
The parameter Epoch
is set (with the format hhmmss
) in the observation
grid definition section of the HISTORY.rc file.
It determines the output frequency of the swath collection and needs to be
equal to the frequency
parameter in the swath collection definition.
The value of Epoch
influences the experiments in many ways.
First, a swath grid will be generated only if there exists one observation file that can be found according to model start time, Epoch, obs_reftime, and obs_frequency. The code will crash if the swath grid fails to initialize.
Secondly, the value of Epoch
determines how many observations files will be used
to gather locations (longitude and longitude pairs) that will all be included
in the swath output file.
The larger is Epoch
the more locations are employed and the code will need more
time and memory to produce the swath file.
It is therefore important to select the appropriate value of Epoch
that will
not slow down the experiments.
Warning
The value of Epoch
should be the same as that of frequency
. The larger is Epoch
the longer it will take to create the swath output file.
The Geostation sampler
interpolates GEOS output to image locations captured by geostationary satellites. It is currently configured for GOES-16-ABI instrument. This input schema comprises the grid definition using satellite specifications and the conventional HISTORY.rc collection setup to specify output files, frequency and fields, etc. An example input is shown below.
COLLECTIONS:
ABI_M6C14_Geostation
::β¨
GRID_LABELS:
abi_grid_GOES16_M6C14
::
abi_grid_GOES16_M6C14.GRID_TYPE: XY
abi_grid_GOES16_M6C14.GRIDNAME: ABI_Fixed_Grid
abi_grid_GOES16_M6C14.index_name_x: x
abi_grid_GOES16_M6C14.index_name_y: y
abi_grid_GOES16_M6C14.var_name_x: x
abi_grid_GOES16_M6C14.var_name_y: y
abi_grid_GOES16_M6C14.var_name_proj: goes_imager_projection
abi_grid_GOES16_M6C14.att_name_proj: longitude_of_projection_origin
abi_grid_GOES16_M6C14.thin_factor: 300
abi_grid_GOES16_M6C14.LM: 72
abi_grid_GOES16_M6C14.GRID_FILENAME: GOES-16-ABI/OR_ABI-L1b-RadF-M6C04_G16_s20192340800216_e20192340809524_c20192340809552.nc'
ABI_M6C14_Geostation.template: '%y4%m2%d2_%h2%n2.nc4'
ABI_M6C14_Geostation.format: 'CFIO',
ABI_M6C14_Geostation.frequency: 060000,
ABI_M6C14_Geostation.grid_label: abi_grid_GOES16_M6C14
ABI_M6C14_Geostation.fields: 'var2', 'Root',
'var3', 'Root'
::
There are a few important factors to clarify for the keywords.
- GRID_TYPE: choosing βXYβ means we will create a MAPL XY-grid using ESMF geometry. GOES-16 images are stored in rectangular matrix with certain values being masked out based on the projection algorithm. The XY-grid interpolates efficiently with the mask function.
- GRIDNAME: use 'ABI' as part of the keywords will invoke the MAPL XY-grid factory to read in
GRID_FILENAME
file that are of the ABI image type; other options have not been implemented. - GRID_FILENAME: the image data file
- Variables that defines the metadata from GRID_FILENAME include
index_name_x/y, var_name_x/y, var_name_proj, att_name_proj
. - thin_factor: the factor to uniformly thin the obs data. Because the ABI image has resolution from a few km to a few hundred meters, the interpolation is highly time-consuming. We recommend user to choose thin_factor=300 for testing purposes.
- LM: the number of levels for the output field, which should be the same as that in GEOS output fields which is the input for sampler
Here the word mask
means that given a geostationary image, we mask those points on cubed sphere that overlap with the image and those that forms the halo for the image. A masked sampler is used when the observing system has a much higher resolution than the NR.
In this case, gridded geophysical variables are masked in such a way that values
are preserved at those grid-points that have been visited/imaged by the satellite,
with possibly the addition of a βhaloβ for aiding off-line interpolation,
with all other grid-points receiving a constant undefined value.
These gridded fields can be efficiently output using internal compression
algorithms available with most modern formats (e.g., NetCDF-4, HDF-5),
or alternatively using a sparse storage scheme.
To exercise this sampler, users must provide a netCDF observation file that has the locations that need to me masked. The task of the code is to identify the non-masked grid-points that correspond to the locations visited by a moving object. The produced files in this collection will contained surface values of the selected fields at the non-masked locations.
Note
The masked sampler applies only for geostationary satellites. Each masked sample collection will be based on the view of a specific satellite.
Key variables for a masked sampler collection
-
sampler_spec: A string that needs to be set to
'mask'
to select a masked sampler collection. - obs_files: Full path to the netCDF file containing the masked locations.
-
obs_file_begin: Beginning date (
YYYY-MM-DDThh:mm:ss
) of the availability of the observation files. At this particular date, there should be one. -
obs_file_end: End date (
YYYY-MM-DDThh:mm:ss
) of the availability of the observation files. At this particular date, there should be one. -
obs_file_interval: date/time interval (
yymmdd hhmmss
) between two consecutive observation files. This setting indicates what the code should expect. If an observation file does not exist at the expected date, the code has a mechanism to use the nearest file. -
index_name_x: Name of the longitude dimension in the observation file (default is
x
. -
index_name_y: Name of the latitude dimension in the observation file (default is
y
. -
var_name_x: Name of the longitude array in the observation file (default is
x
). -
var_name_y: Name of the latitude array in the observation file (default is
y
). - var_name_proj: Name of the variable providing map projection information.
- att_name_proj: Attribute for the longitude origin in the map projection.
-
thin_factor: Integer value used to reduce regridding matrix size (default is
-1
).
Here is an example of masked sampler `HISTORY.rc` file
COLLECTIONS:
ABI_M6C01_Mask
::
ABI_M6C01_Mask.sampler_spec: 'mask'
ABI_M6C01_Mask.obs_file_begin: '2019-07-31T00:00:00'
ABI_M6C01_Mask.obs_file_end: '2019-11-01T00:00:00'
ABI_M6C01_Mask.obs_file_interval: '000000 001000'
ABI_M6C01_Mask.obs_files: /discover/nobackup/projects/gmao/aist-nr/data/GOES-X/OR_ABI-L1b-RadF-M6C01_G16_s20192340840216_e20192340849524_c20192340849582.nc
ABI_M6C01_Mask.index_name_x: x
ABI_M6C01_Mask.index_name_y: y
ABI_M6C01_Mask.var_name_x: x
ABI_M6C01_Mask.var_name_y: y
ABI_M6C01_Mask.var_name_proj: goes_imager_projection
ABI_M6C01_Mask.att_name_proj: longitude_of_projection_origin
ABI_M6C01_Mask.thin_factor: 100
ABI_M6C01_Mask.template: '%y4%m2%d2_%h2%n2.nc4',
ABI_M6C01_Mask.format: 'CFIO',
ABI_M6C01_Mask.frequency: 001000,
ABI_M6C01_Mask.duration: 001000,
ABI_M6C01_Mask.regrid_method: 'BILINEAR' ,
ABI_M6C01_Mask.fields: 'PHIS' , 'AGCM' , 'phis' ,
'TROPT' , 'AGCM' ,
'TS' , 'SURFACE' , 'ts' ,
'TSOIL1' , 'SURFACE' ,
'PS' , 'DYN' , 'ps' ,
'Q' , 'MOIST' , 'sphu' ,
::
Note
There is no restriction on the start and end time of the experiment. The code will produce files for the masked collection during the entire duration of the experiment using the settings of frequency
and duration
. The code uses the masked locations of the obs_files
file to select the grid-points where the field values will be written out as a two dimensional array (time, location).
- Atlas, R., 1997:
Atmospheric observations and experiments to assess their usefulness in data assimilation
. J. Meteor. Soc. Japan, 75, 111β130, https://doi.org/10.2151/jmsj1965.75.1B_111. - Atlas, R., L. Bucci, B. Annane, R. Hoffman, and S. Murillo, 2015:
Observing system simulation experiments to assess the potential impact of new observing systems on hurricane forecasting
. Mar. Technol. Soc. J., 49, 140β148, https://doi.org/10.4031/MTSJ.49.6.3. - Boukabara, S. A., and Coauthors, 2016:
Community Global Observing System Simulation Experiment (OSSE) Package (CGOP): Description and usage
. J. Atmos. Oceanic Technol., 33, 1759β1777, https://doi.org/10.1175/JTECH-D-16-0012.1.