Declaring the Raw NWM Data Source - NOAA-OWP/wres GitHub Wiki
Table of Contents
- What evaluation declaration is required to read raw NWM data?
- What are the data requirements of an NWM data source?
- Are there example declarations available?
- Example declaration for NWM short-range forecasts
- Example declaration for NWM single-valued, medium-range forecasts
- Example declaration for NWM ensemble, medium-range forecasts using the legacy NWM Medium Range Ensemble
- Example for another variable, such as
qSfcLatRunoff
: - Applicable to NWC WRDS-Store: Example combining the NWM medium-range ensemble forecasts for NWM v2.1 and v2.2 into a single evaluation (since the model was unchanged between those versions):
- Example use of nomads
Raw NWM data is provided in a netCDF format, with the files organized in a specific directory structure and file names following a specific convention. A complete archive of that data is available in Google Cloud (with all versions lumped together) and the WRDS-Store hosted at the National Water Center (NWC; organized by version number). Two days of data is also available on NCEP's Nomads server. Instructions for pointing the WRES to those data sources is provided below.
NOTE: The WRDS-Store is only available at the NWC and is accessible via the Central OWP WRES hosted at the NWC, so that users at River Forecast Centers can made use of the archive in their evaluations. For mor information on the use of WRDS-Store, see the VLab WRES User Support wiki. This will will focus on the publicly available Google Cloud service for most declaration examples.
What are the data requirements of an NWM data source?
NWM short-range, medium-range deterministic, medium-range ensemble, long-range, and analysis-and-assimilation can be read by the WRES for various configurations (CONUS, Alaska, Hawaii, Puerto Rico). The analyses are less straightforward to configure. The features WRES requests from the netCDF dataset will be determined from the WRES project declaration’s features
tag, either directly from an NWM feature id declared or indirectly from the WRES feature correlations that WRES is aware of.
What evaluation declaration is required to read raw NWM data?
reference_dates
In all cases, whether using the WRDS-Store archive at the National Water Center, a filesystem, s3 bucket (e.g., AWS), or Nomads, a reference_dates
range is required in the declaration so that the WRES software can set the scope of requests to open and read datasets.
interface
The interface
within a sources
entry must also be specified. Interface options starting with nwm
are short-hand for exact netCDF blob layouts with particular attributes. While extensive testing has been done of the “streamflow” variable
, other variables can be declared as long as they were packed into the netCDF data the same way as the streamflow data. To get a list of variables available in a particular dataset, download a sample netCDF and run the tool ncdump
to obtain the header. For example, visit https://nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/ and go to one of the available dates and datasets, download a netCDF, use ncdump -h [netCDF file]
to see variables. Note that the netCDF reader in WRES assumes a 32-bit integer value packed by either a 32-bit float or 64-bit double scale and offset value. The enumeration of available interface
strings are found in the YAML (as of WRES Release 6.14) schema:
https://github.com/NOAA-OWP/wres/blob/master/wres-config/nonsrc/schema.yml (search for SourceInterfaceEnum
).
variable
The variable
for the sources
entry must be declared, as it defines verbatim the data variable to be obtained from the netCDF.
uri
The uri
for the sources
entry specifies a path prefix up until the point where the netCDF blob layout changes, which is the nwm.yyyyMMdd
directory/folder/bucket. After that file name prefix, the date ranges from the reference_dates
combined with the interface
short-hand is used to fill out the remainder of the paths for all blobs.
Any URI that starts with cdms3://
instructs the WRES to read using the CDMS3 (Common Data Model S3) scheme which makes HTTP range requests from a web service that implements the S3 or Google Cloud Services (GCS) APIs. That will lead to much quicker reads (by, perhaps, an order of magnitude) when compared to using the HTTP scheme in the URI (i.e., http
or https
). However, this can only be used if the web service implements the S3 or GCS APIs, otherwise the requests will fail and the HTTP scheme should be used instead. For all web services that implement the S3 or GCS APIs, it is highly recommended that the CDMS3 scheme is declared, as it is much more performant than HTTP.
The following are the uri
values to use to obtain data from Google Cloud and Nomads, both available publicly (again, for information on WRDS-Store, only available at the NWC, see the WRES User Support VLab project wiki):
- Google Cloud (S3):
cdms3://storage.googleapis.com/national-water-model
- Nomads (not available in S3, so "https://" must be used): https://nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/prod
Are there example declarations available?
Below are examples of declaring the use of raw NWM data in a netCDF format.
All of the examples below are written assuming NWM data is to be obtained from Google Cloud unless otherwise specified. To read from a local disk, change "https://" or "cdms3://" to "file://" and modify the path that follows accordingly. Furthermore, NWM 3.0 data is being read, in most cases, with the directory structure assumed below that point. Lastly, only the CONUS data is evaluated in the examples that follow. However, interface
options are also available for Puerto Rico, Hawaii, and Alaska. Please review the schema at the link provided above for a complete listing.
Example declaration for NWM short-range forecasts
predicted:
label: NWM Short Range
sources:
- uri: cdms3://storage.googleapis.com/national-water-model
interface: nwm short range channel rt conus
variable: streamflow
Example declaration for NWM single-valued, medium-range forecasts
predicted:
label: NWM Medium Range Deterministic
sources:
- uri: cdms3://storage.googleapis.com/national-water-model
interface: nwm medium range deterministic channel rt conus hourly
variable: streamflow
NOTE: To use the NWM Medium Range Deterministic v2.1 and later hourly data, use interface nwm medium range deterministic channel rt conus hourly
. Use of nwm medium range deterministic channel rt conus
, designed for NWM v1.2 and earlier data, will result in every third hour being evaluated (NWM forecasts were for every three hours with NWM v1.2 and earlier).
Example declaration for NWM ensemble, medium-range forecasts using the legacy NWM Medium Range Ensemble
predicted:
label: NWM Medium Range Ensemble
sources:
- uri: cdms3://storage.googleapis.com/national-water-model
interface: nwm medium range ensemble channel rt conus hourly
variable: streamflow
NOTE: To use the NWM Medium Range Deterministic v2.1 and later hourly data, use interface nwm medium range ensemble channel rt conus hourly
. Use of nwm medium range ensemble channel rt conus
, designed for NWM v1.2 and earlier data, will result in every third hour being evaluated (NWM forecasts were for every three hours with NWM v2.0 and earlier).
qSfcLatRunoff
:
Example for another variable, such as predicted:
label: NWM Short Range
sources:
- uri: cdms3://storage.googleapis.com/national-water-model
interface: nwm short range channel rt conus
variable: qSfcLatRunoff
Applicable to NWC WRDS-Store: Example combining the NWM medium-range ensemble forecasts for NWM v2.1 and v2.2 into a single evaluation:
The NWC WRDS-Store archive includes a version number within the URLs that allows for readily delineating between NWM versions. This example will allow for combining data across versions, in this case 2.1 and 2.2:
predicted:
label: NWM Medium Range Ensemble
sources:
- uri: https://[omitted WRDS-Store URL]/nwm/2.2
interface: nwm medium range deterministic channel rt conus hourly
- uri: https://[omitted WRDS-Store URL]/nwm/2.1
interface: nwm medium range deterministic channel rt conus hourly
variable: streamflow
Example use of nomads
Nomads has one or two days available, constantly changing date-range availability:
predicted:
label: NWM Long Range Ensemble
sources:
- uri: https://nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/
interface: nwm long range channel rt conus
variable: streamflow