Declaring the Raw NWM Data Source - NOAA-OWP/wres GitHub Wiki

Table of Contents

Raw NWM data is provided in a netCDF format, with the files organized in a specific directory structure and file names following a specific convention. A complete archive of that data is available in Google Cloud (with all versions lumped together) and the WRDS-Store hosted at the National Water Center (NWC; organized by version number). Two days of data is also available on NCEP's Nomads server. Instructions for pointing the WRES to those data sources is provided below.

NOTE: The WRDS-Store is only available at the NWC and is accessible via the Central OWP WRES hosted at the NWC, so that users at River Forecast Centers can made use of the archive in their evaluations. For mor information on the use of WRDS-Store, see the VLab WRES User Support wiki. This will will focus on the publicly available Google Cloud service for most declaration examples.

What are the data requirements of an NWM data source?

NWM short-range, medium-range deterministic, medium-range ensemble, long-range, and analysis-and-assimilation can be read by the WRES for various configurations (CONUS, Alaska, Hawaii, Puerto Rico). The analyses are less straightforward to configure. The features WRES requests from the netCDF dataset will be determined from the WRES project declaration’s features tag, either directly from an NWM feature id declared or indirectly from the WRES feature correlations that WRES is aware of.

What evaluation declaration is required to read raw NWM data?

reference_dates

In all cases, whether using the WRDS-Store archive at the National Water Center, a filesystem, s3 bucket (e.g., AWS), or Nomads, a reference_dates range is required in the declaration so that the WRES software can set the scope of requests to open and read datasets.

interface

The interface within a sources entry must also be specified. Interface options starting with nwm are short-hand for exact netCDF blob layouts with particular attributes. While extensive testing has been done of the “streamflow” variable, other variables can be declared as long as they were packed into the netCDF data the same way as the streamflow data. To get a list of variables available in a particular dataset, download a sample netCDF and run the tool ncdump to obtain the header. For example, visit https://nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/ and go to one of the available dates and datasets, download a netCDF, use ncdump -h [netCDF file] to see variables. Note that the netCDF reader in WRES assumes a 32-bit integer value packed by either a 32-bit float or 64-bit double scale and offset value. The enumeration of available interface strings are found in the YAML (as of WRES Release 6.14) schema:

https://github.com/NOAA-OWP/wres/blob/master/wres-config/nonsrc/schema.yml (search for SourceInterfaceEnum).

variable

The variable for the sources entry must be declared, as it defines verbatim the data variable to be obtained from the netCDF.

uri

The uri for the sources entry specifies a path prefix up until the point where the netCDF blob layout changes, which is the nwm.yyyyMMdd directory/folder/bucket. After that file name prefix, the date ranges from the reference_dates combined with the interface short-hand is used to fill out the remainder of the paths for all blobs.

Any URI that starts with cdms3:// instructs the WRES to read using the CDMS3 (Common Data Model S3) scheme which makes HTTP range requests from a web service that implements the S3 or Google Cloud Services (GCS) APIs. That will lead to much quicker reads (by, perhaps, an order of magnitude) when compared to using the HTTP scheme in the URI (i.e., http or https). However, this can only be used if the web service implements the S3 or GCS APIs, otherwise the requests will fail and the HTTP scheme should be used instead. For all web services that implement the S3 or GCS APIs, it is highly recommended that the CDMS3 scheme is declared, as it is much more performant than HTTP.

The following are the uri values to use to obtain data from Google Cloud and Nomads, both available publicly (again, for information on WRDS-Store, only available at the NWC, see the WRES User Support VLab project wiki):

Are there example declarations available?

Below are examples of declaring the use of raw NWM data in a netCDF format.

All of the examples below are written assuming NWM data is to be obtained from Google Cloud unless otherwise specified. To read from a local disk, change "https://" or "cdms3://" to "file://" and modify the path that follows accordingly. Furthermore, NWM 3.0 data is being read, in most cases, with the directory structure assumed below that point. Lastly, only the CONUS data is evaluated in the examples that follow. However, interface options are also available for Puerto Rico, Hawaii, and Alaska. Please review the schema at the link provided above for a complete listing.

Example declaration for NWM short-range forecasts

    predicted:
      label: NWM Short Range
      sources:
        - uri: cdms3://storage.googleapis.com/national-water-model
          interface: nwm short range channel rt conus
      variable: streamflow

Example declaration for NWM single-valued, medium-range forecasts

predicted:
  label: NWM Medium Range Deterministic
  sources:
    - uri: cdms3://storage.googleapis.com/national-water-model
      interface: nwm medium range deterministic channel rt conus hourly
  variable: streamflow

NOTE: To use the NWM Medium Range Deterministic v2.1 and later hourly data, use interface nwm medium range deterministic channel rt conus hourly. Use of nwm medium range deterministic channel rt conus, designed for NWM v1.2 and earlier data, will result in every third hour being evaluated (NWM forecasts were for every three hours with NWM v1.2 and earlier).

Example declaration for NWM ensemble, medium-range forecasts using the legacy NWM Medium Range Ensemble

predicted:
  label: NWM Medium Range Ensemble
  sources:
    - uri: cdms3://storage.googleapis.com/national-water-model
      interface: nwm medium range ensemble channel rt conus hourly
  variable: streamflow

NOTE: To use the NWM Medium Range Deterministic v2.1 and later hourly data, use interface nwm medium range ensemble channel rt conus hourly. Use of nwm medium range ensemble channel rt conus, designed for NWM v1.2 and earlier data, will result in every third hour being evaluated (NWM forecasts were for every three hours with NWM v2.0 and earlier).

Example for another variable, such as qSfcLatRunoff:

predicted:
  label: NWM Short Range
  sources:
    - uri: cdms3://storage.googleapis.com/national-water-model
      interface: nwm short range channel rt conus
  variable: qSfcLatRunoff

Applicable to NWC WRDS-Store: Example combining the NWM medium-range ensemble forecasts for NWM v2.1 and v2.2 into a single evaluation:

The NWC WRDS-Store archive includes a version number within the URLs that allows for readily delineating between NWM versions. This example will allow for combining data across versions, in this case 2.1 and 2.2:

predicted:
  label: NWM Medium Range Ensemble
  sources:
    - uri: https://[omitted WRDS-Store URL]/nwm/2.2
      interface: nwm medium range deterministic channel rt conus hourly
    - uri: https://[omitted WRDS-Store URL]/nwm/2.1
      interface: nwm medium range deterministic channel rt conus hourly
  variable: streamflow

Example use of nomads

Nomads has one or two days available, constantly changing date-range availability:

predicted:
  label: NWM Long Range Ensemble
  sources:
    - uri: https://nomads.ncep.noaa.gov/pub/data/nccf/com/nwm/prod/
      interface: nwm long range channel rt conus
  variable: streamflow