Data preparation - npolar/api.npolar.no GitHub Wiki

Preparing scientific data prior to publishing

The following is a guide for scientists and others planning to deliver data to Norwegian Polar Data Centre (NPDC) staff for publishing on the web, in the Npolar API system.

Elevator-pitch version

  • Create UTF-8 text files
  • Format data as CSV or JSON
  • Consult the destination API's schema for variable names and format rules
  • Bundle properties measured at the same space/time into documents (equivalent to one row of CSV)

About the Npolar API

The Npolar API system consists of searchable web data stores open for any kind of data originating from the Norwegian Polar Institute's activities.

When data is published in a Npolar API, it is machine readable in multiple formats by any client with web access.

The Npolar API is developed by NPDC staff on top of the powerful Lucene search engine and JSON database CouchDB.

Data atoms

Before publishing data you need to break it down to a simple 2-dimensional (tabular) structure where each data atom (equivalent to one row of CSV) contains:

  • latitude
  • longitude
  • time ("measured")
  • one or more properties (preferably scalar values, but vectors or arrays of objects are also possible)

Example (oceanography profile)

Example of the above mould expressed in three formats for a oceanography profile:

CSV (tab separated):

latitude	longitude	measured	sea_water_salinity	sea_water_temperature	sea_water_pressure_due_to_sea_water	cruise	station
77.5	3.0	2000-09-04T16:45:10Z	35.0057	3.0158	65.0	Framstrait-2000	59

JSON (array)

[{ "sea_water_temperature": 3.0158,
  "station": "59",
  "sea_water_pressure_due_to_sea_water": 65,
  "measured": "2000-09-04T16:45:10Z",
  "longitude": 3,
  "latitude": 77.5,
  "sea_water_salinity": 35.0057,
  "cruise": "Framstrait-2000"
}]

And lastly: GeoJSON (straight from the Oceanography API).

All of the above documents are compatible with the variable defined in the Oceanography API's JSON schema.

Use any of the above data formats for points, and GeoJSON for other geometries.

Retrieving data

By breaking down the data into space-time points of measurements, the same data model applies no matter if the position is fixed or shifting, or it the measured time is fixed or varies (in contrast to e.g. NOAA's approach.

Retrieving data that belongs together is simple, here's the CSV of the entire CTD profile of station 59 in the Framstrait-2000 cruise

Permanant URIs

Notice the filter-property=value in the web address above, these makes it easy to link to any combination of data for any property. The permanent address of the entire Framstrait-2000 dataset is then, simply:

http://api.npolar.no/oceanography/?q=&filter-cruise=Framstrait-2000

Download data

curl

curl -H "Accept-Encoding: gzip" "http://api.npolar.no/oceanography/?q=&filter-cruise=Framstrait-2000&limit=all&format=csv" > fs2000-oceanography.csv

Normative references