Data Services TDS - glos/Documentation GitHub Wiki

THREDDS is used at GLOS to provide data streaming services against numeric model and remote sensing datasets that cover the entire Great Lakes Basin. THREDDS is also used to store certain observation datasets that do not match the near real-time data model for continuous data sensors.

Due to the large amount of data it serves and its caching mechanism implementation, THREDDS is generally memory-bound. A dedicated virtual machine with 24 GB memory running CentOS 6 was therefore configured to host solely the GLOS THREDDS instance. The datasets exposed to THREDDS are stored on a disk array with open access to this virtual machine through NFS on a gigabit LAN network switch. At the application level, Tomcat 7 with Oracle JRE 1.7 was installed and certain optimizations were done at Tomcat level, such as activating the Tomcat Native APR library to boost performance, and configuring the jsvc daemon/watchdog for better control and reliability.

Tomcat 7, as the JEE server, only listens to tcp/80 on the localhost. Nginx was implemented as a reverse proxy server in front of Tomcat. With its asynchronous I/O capacity, nginx offers better performance for proxy purposes. Here is a snippet of the configuration in nginx for the THREDDS proxy:

location ~^/thredds/* { proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header Host $host; proxy_pass http://localhost:8080; }

The THREDDS dataset inventory catalog (catalog.xml) contains the master configuration files for datasets that are served on THREDDS. This file serves as an index so that each entry in catalog.xml provides a virtual directory of available data and its associated metadata. It then points to a sub-catalog which is a different xml file that implements the configuration for that particular dataset.

For example, the index entry below indicates that the specific configurations for the dataset “Cumulative Impact Assessment – Water Use Datasets” can be found in a file called wateruse.xml, which is located in the folder called “glc”, and that the title of this dataset that is displayed on THREDDS web page is assigned in the attribute “xlink:title”.

<dataset name="Cumulative Impact Assessment -- Water Use Datasets"> <catalogRef xlink:title="GLC - Water Use Historical 2006 to 2011" xlink:href="glc/wateruse.xml" name=""/> </dataset>

Both static and dynamic datasets are presented in the THREDDS catalog.

Static datasets are backed either by individual NetCDF files or a series of NetCDF files that will not be updated once they have been created. An example of the catalog configuration for the former case is listed below.

<dataset name="ER43 - Hyperpro II Optical Profile" ID="er43_hyperpro" urlPath="glop/er43_hyperpro.nc"> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2" location="/var/thredds/glop_datasets/er43_hyperpro.nc"> <attribute name="title" value="ER43 - Hyperpro II Optical Profile" /> <attribute name="summary" value="Great Lakes Optical Properties" /> <attribute name="time" value="2007-09-12 13:00:00-04:00" /> <attribute name="metadata_link" type="String" value="http://data.glos.us/portal/" /> <attribute name="Metadata_Conventions" type="String" value="Unidata Dataset Discovery v1.0"/> <attribute name="standard_name_vocabulary" type="String" value="http://www.cgd.ucar.edu/cms/eaton/cf-metadata/standard_name.html" /> </netcdf> </dataset>

For the latter case, the virtual aggregation against netCDF files is performed using NetCDF Markup Language (NcML). A union of NetCDF files can be done as shown below:

<dataset name="GLOS Glider - unit_236 - Aggregation" urlPath="gliders/unit_236-Agg" ID="unit_236-Agg" dataType="Point"> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <attribute name="title" value="GLOS Glider - Aggregation" /> <attribute name="summary" value="Aggregation of Glider data from The Great Lakes." /> <attribute name="metadata_link" type="String" value="http://data.glos.us/portal/" /> <attribute name="geospatial_vertical_min" type="double" value="0.0" /> <attribute name="geospatial_vertical_max" type="double" value="0.0" /> <attribute name="geospatial_vertical_units" type="String" value="meters" /> <attribute name="geospatial_vertical_resolution" type="double" value="0.0" /> <attribute name="geospatial_vertical_positive" type="String" value="up" /> <aggregation dimName="time" type="joinExisting" recheckEvery="60 min"> <scan location="/var/thredds/glider/unit_236/" suffix=".nc"/> </aggregation> </netcdf> </dataset>

Dynamic datasets, on the other hand, are served using Feature Collection, which is the TDS way to handle collections of CDM Feature Datasets. The feature collection is designed to be used for the collection of grid datasets; the virtual dataset can be created in THREDDS on top of the raw collection. An example of a feature collection dataset on GLOS THREDDS server is below:

<featureCollection name="Lake Michigan - Nowcast - 2D - 2009" featureType="FMRC" harvest="true" path="glos/glcfs/archive2009/michigan/ncfmrc-2d"> <collection spec="/var/thredds/GLCFS/Archive/2009/m#yyyyDDDHH#\.out1\.nc$" olderThan="5 min" /> <protoDataset choice="Penultimate"> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <remove type="dimension" name="validtime_length" /> <remove type="variable" name="validtime"/> <attribute name="validtime" value="01-JAN-2009 00:00 GMT" /> <attribute name="validtime_DOY" value="001, 2009 00:00 GMT" /> <attribute name="comment2" value="1-hourly model 2D output starting at validtime plus 1 hr" /> <attribute name="Conventions" type="String" value="CF-1.6"/> </netcdf> </protoDataset> <fmrcConfig datasetTypes="Best Files" /> <netcdf xmlns="http://www.unidata.ucar.edu/namespaces/netcdf/ncml-2.2"> <variable name="depth" shape="ny nx" type="float"> <attribute name="long_name" value="Bathymetry " /> <attribute name="units" value="meters" /> <attribute name="positive" value="down" /> <attribute name="standard_name" value="depth" /> <attribute name="coordinates" value="lat lon"/> </variable> <variable name="wvh" shape="time ny nx" type="float"> <attribute name="long_name" value="Significant Wave Height" /> <attribute name="units" value="meters" /> <attribute name="missing_value" type="float" value="-99999.0" /> <attribute name="standard_name" value="wave_height" /> <attribute name="coordinates" value="time lat lon"/> </variable> </netcdf> </featureCollection>

Other than the elements for describing the dataset, the catalog file also contains the components to depict the metadata associated with the dataset. This information is not only displayed on the THREDDS web page, but can be retrieved through the ncISO metadata service as well. The metadata section is usually defined under the dataset tag:

<metadata inherited="true"> <serviceName>all</serviceName> <keyword vocabulary="GCMD Science Keywords">GLOS, GLCFS, Nowcast, Great Lakes</keyword> <date type="created">2012-01-01</date> <date type="modified">2012-01-01</date> <date type="issued">2012-01-01</date> <creator> <name vocabulary="DIF">Dr. Dave Schwab</name> <contact url="http://www.glerl.noaa.gov/" email="[email protected]"/> </creator> <publisher> <name>GLOS DMAC</name> <contact url="http://glos.us" email="[email protected]"/> </publisher> <documentation type="rights">No usage restrictions</documentation> <documentation xlink:href="http://www.glerl.noaa.gov/res/glcfs/" xlink:title="Great Lakes Coastal Forecasting System"/> <documentation type="Summary"> Great Lakes Coastal Forecasting System</documentation> <documentation type="Disclaimer"> NOAA GLERL is providing this data "as is," and NOAA GLERL and its partners cannot be held responsible, nor assume any liability for any damages caused by inaccuracies in this data or documentation, or as a result of the failure of the data or software to function in a particular manner. NOAA GLERL and its partners make no warranty, expressed or implied, as to the accuracy, completeness, or utility of this information, nor does the fact of distribution constitute a warranty. Real-time data have not been subjected to quality control or quality assurance procedures. Timely delivery of data and products through the Internet is not guaranteed. Before using information obtained from this server, special attention should be given to the date and time of the data and products being displayed. </documentation> <contributor role="distributor">GLOS DMAC</contributor> <contributor role="producer">GLERL</contributor> <property name="viewer" value="http://data.glos.us/portal/, GLOS Data Portal" /> </metadata>

GLOS currently offers multiple web services through THREDDS: OPeNDAP, WMS, NCML, UDDC and ncISO. With these provided services, users can efficiently conduct data discovery and data retrieval in an automatic means.

⚠️ **GitHub.com Fallback** ⚠️