NOAA_Weather_API_Usage - Flight-Path-Analysis/FlightPathAnalysis GitHub Wiki

Basic Data Request

To access the National Oceanic and Atmospheric Administration (NOAA) Database through the API, one first needs to create a token from this page. It should take less than 1 minute.

One acquired, we can use the requests python package to send requests to the server and read the responses. Be sure the have the package installed with:

pip install requests

A request is made of up 3 main components.

  • A header. A dictionary which will contain you credentials (your token).
  • The url you're requesting information from, in this case https://www.ncei.noaa.gov/cdo-web/api/v2/data for data.
  • And the body of your requests, what you're requesting, what dates, what stations, etc, this is the param variable in the code.

Example Requests and Response

The code snipped below is an example of a request that should work.

import requests

headers = {
    "token": "MY_TOKEN" # Ensure to replace this with your token. For security reasons, consider reading it from a secure environment variable or a confidential file.
}

base_url = "https://www.ncei.noaa.gov/cdo-web/api/v2/data" # The URL where data requests are supposed to be sent to.

# These are all the parameters of your data requests, we'll go over them later.
params = {
    "datasetid": "GHCND",
    "stations": "USC00457180,USC00390043",
    "startdate": "2023-01-01",
    "enddate": "2023-01-05",
    "dataTypes": "MLY-PRCP-NORMAL,MLY-TMIN-NORMAL,TMIN,TMAX",
    "format": "json",
    "includeAttributes": "true",
    "includeStationName": "true",
    "includeStationLocation": "1",
    "units": "metric"
}

response = requests.get(base_url, params=params, headers=headers)

# Check the status code
if response.status_code == 200:
    data = response.json() # json is nothing but a dictionary format, very cool for python.
else:
    print(f"Error: {response.status_code}")
    print(response.text)
print(data)

This should give you an output like:

{'metadata': {'resultset': {'offset': 1, 'count': 512672, 'limit': 25}}, 'results': [{'date': '2023-01-01T00:00:00', 'datatype': 'PRCP', 'station': 'GHCND:AE000041196', 'attributes': 'D,,S,', 'value': 0.0}, {'date': '2023-01-01T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:AE000041196', 'attributes': 'H,,S,', 'value': 20.7}, {'date': '2023-01-01T00:00:00', 'datatype': 'TMAX', 'station': 'GHCND:AE000041196', 'attributes': ',,S,', 'value': 25.2}, {'date': '2023-01-01T00:00:00', 'datatype': 'TMIN', 'station': 'GHCND:AE000041196', 'attributes': ',,S,', 'value': 14.9}, {'date': '2023-01-01T00:00:00', 'datatype': 'PRCP', 'station': 'GHCND:AEM00041194', 'attributes': ',,S,', 'value': 0.0}, {'date': '2023-01-01T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:AEM00041194', 'attributes': 'H,,S,', 'value': 22.3}, {'date': '2023-01-01T00:00:00', ...

That is, a dictionary of all the responses that fit the parameters you gave, containing the station, date, data type, and values of what was requested. Let's break down each part of the request, as well as the response.

Request Parameters

datasetid

As the name suggests, this is the ID of the dataset we're requesting data from. There are a total of 11 datasets to request data from, each with their own available datatypes and things to request. You can see them below:

Name ID
Daily Summaries GHCND
Global Summary of the Month GSOM
Global Summary of the Year GSOY
Weather Radar (Level II) NEXRAD2
Weather Radar (Level III) NEXRAD3
Normals Annual/Seasonal NORMAL_ANN
Normals Daily NORMAL_DLY
Normals Hourly NORMAL_HLY
Normals Monthly NORMAL_MLY
Precipitation 15 Minute PRECIP_15
Precipitation Hourly PRECIP_HLY

For this project, we'll be focusing on "Weather Radar (Level II)", which contains three-dimensional reflectivity (precipitation) and radial speed (wind speed) information. Further explanation of how to read and interpret this data can be found on the corresponding wiki

To get this list of datasets, you can run the code below:

## All available datasets
import requests

headers = {
    "token": "MY_TOKEN"
}

url = "https://www.ncei.noaa.gov/cdo-web/api/v2/datasets"
response = requests.get(url, headers=headers)

all_datasets = response.json()
print(all_datasets)

stations

This parameter expects a string with a list of all the weather stations we want information from. At the time of writing this, there is a total of 149411 weather stations we can get data from, each with their own sensors and available data. You can get a list of the first 25 with a code just like the one above.

## Some available stations
import requests

headers = {
    "token": "MY_TOKEN"
}

url = "https://www.ncei.noaa.gov/cdo-web/api/v2/stations"
response = requests.get(url, headers=headers)

all_stations = response.json()
print(all_stations)

By default, this request is limited to 25 responses, we can (and will) extend it later with while loops. Notice that on the response, each station comes with latitude and longitude information! As well as minimum and maximum dates for which data is available. That'll be crucial of when we're deciding which stations we'll care about.

startdate and enddate

Pretty self explanatory, the starting and ending date we're interested in getting data from, in YYYY-MM-DD format

dataTypes

Now this is the other crucial part of the process, this tells us what kind of data we want to get from that station, in those dates, from that dataset. The possible values for dataTypes can be listed with the code below (at least the first 25).

## All available datasets
import requests

headers = {
    "token": "MY_TOKEN"
}

url = "https://www.ncei.noaa.gov/cdo-web/api/v2/datatypes"
response = requests.get(url, headers=headers)

all_datatypes = response.json()
all_datatypes

As of the writing of this, there are 1566 different datatypes we can get, but as you can see from the response, most of them are pretty useless or redundant. NOTICE: Not all stations and not all datasets and dates will have all datatypes available.

format

The format we want to get our data back. json is the best for our purposes, since we can directly interpret them as python dictionaries, but we could also choose csv, or pdf.

includeAttributes

Weather or not to include attributes to the data, explained below on results.

includeStationName

Weather or not to include the name of the station this data was drawn from

includeStationLocation

Weather or not to include the location of the station the data was drawn from

units

What units the data is returned in. Can be metric or standard

Response parameters

metadata

This is just a header from the requests package with some information about the request, nothing important

results

This is an array containing all the results from our query, each result is a dictionary containing:

  • date: The date relating to the available data
  • datatype: The type of data in the dictionary (since we can specify many)
  • station: The ID of the station it was gathered from.
  • attributes: Some flags pertaining to the data, may convey warning signals or data quality flags
  • value: the actual value of the data requested. In the case of our example, the first result is:
{'date': '2023-01-01T00:00:00', 'datatype': 'PRCP', 'station': 'GHCND:AE000041196', 'attributes': 'D,,S,', 'value': 0.0}

That is, the precipitation measure by the station "GHCND:AE000041196" on Jan 1st, 2023, is 0.0 and has the flags D,#,S. Further information about the flags can be found here

Accessing more data than the default limit

As mentioned before, the api auto-imposes a limit of 25 responses, to access the rest, one can use a while loop, as shown on the example below, gather all possible [datatypes](#Request Parameters)

import requests

headers = {
    "token": "MY_TOKEN"
}

url = "https://www.ncei.noaa.gov/cdo-web/api/v2/datatypes"
offset = 1
chunk_size = 100 # Maximum number of responses expected in each loop. Keep it manageble
all_datatypes = []

while True:
    response = requests.get(f"{url}?limit={chunk_size}&offset={offset}", headers=headers)
    data = response.json()
    
    # Append results to the list
    all_datatypes.extend(data['results'])
    
    # Check if there's more data to fetch
    if len(data['results']) < chunk_size:
        break
    
    # Update the offset
    offset += chunk_size

all_datatypes

This code will gather the next 100 available datatypes until there's no more to be requested.