NOAA_Weather_API_Usage - Flight-Path-Analysis/FlightPathAnalysis GitHub Wiki
Basic Data Request
To access the National Oceanic and Atmospheric Administration (NOAA) Database through the API, one first needs to create a token from this page. It should take less than 1 minute.
One acquired, we can use the requests python package to send requests to the server and read the responses. Be sure the have the package installed with:
pip install requests
A request is made of up 3 main components.
- A
header. A dictionary which will contain you credentials (your token). - The
urlyou're requesting information from, in this casehttps://www.ncei.noaa.gov/cdo-web/api/v2/datafor data. - And the body of your requests, what you're requesting, what dates, what stations, etc, this is the
paramvariable in the code.
Example Requests and Response
The code snipped below is an example of a request that should work.
import requests
headers = {
"token": "MY_TOKEN" # Ensure to replace this with your token. For security reasons, consider reading it from a secure environment variable or a confidential file.
}
base_url = "https://www.ncei.noaa.gov/cdo-web/api/v2/data" # The URL where data requests are supposed to be sent to.
# These are all the parameters of your data requests, we'll go over them later.
params = {
"datasetid": "GHCND",
"stations": "USC00457180,USC00390043",
"startdate": "2023-01-01",
"enddate": "2023-01-05",
"dataTypes": "MLY-PRCP-NORMAL,MLY-TMIN-NORMAL,TMIN,TMAX",
"format": "json",
"includeAttributes": "true",
"includeStationName": "true",
"includeStationLocation": "1",
"units": "metric"
}
response = requests.get(base_url, params=params, headers=headers)
# Check the status code
if response.status_code == 200:
data = response.json() # json is nothing but a dictionary format, very cool for python.
else:
print(f"Error: {response.status_code}")
print(response.text)
print(data)
This should give you an output like:
{'metadata': {'resultset': {'offset': 1, 'count': 512672, 'limit': 25}}, 'results': [{'date': '2023-01-01T00:00:00', 'datatype': 'PRCP', 'station': 'GHCND:AE000041196', 'attributes': 'D,,S,', 'value': 0.0}, {'date': '2023-01-01T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:AE000041196', 'attributes': 'H,,S,', 'value': 20.7}, {'date': '2023-01-01T00:00:00', 'datatype': 'TMAX', 'station': 'GHCND:AE000041196', 'attributes': ',,S,', 'value': 25.2}, {'date': '2023-01-01T00:00:00', 'datatype': 'TMIN', 'station': 'GHCND:AE000041196', 'attributes': ',,S,', 'value': 14.9}, {'date': '2023-01-01T00:00:00', 'datatype': 'PRCP', 'station': 'GHCND:AEM00041194', 'attributes': ',,S,', 'value': 0.0}, {'date': '2023-01-01T00:00:00', 'datatype': 'TAVG', 'station': 'GHCND:AEM00041194', 'attributes': 'H,,S,', 'value': 22.3}, {'date': '2023-01-01T00:00:00', ...
That is, a dictionary of all the responses that fit the parameters you gave, containing the station, date, data type, and values of what was requested. Let's break down each part of the request, as well as the response.
Request Parameters
datasetid
As the name suggests, this is the ID of the dataset we're requesting data from. There are a total of 11 datasets to request data from, each with their own available datatypes and things to request. You can see them below:
| Name | ID |
|---|---|
| Daily Summaries | GHCND |
| Global Summary of the Month | GSOM |
| Global Summary of the Year | GSOY |
| Weather Radar (Level II) | NEXRAD2 |
| Weather Radar (Level III) | NEXRAD3 |
| Normals Annual/Seasonal | NORMAL_ANN |
| Normals Daily | NORMAL_DLY |
| Normals Hourly | NORMAL_HLY |
| Normals Monthly | NORMAL_MLY |
| Precipitation 15 Minute | PRECIP_15 |
| Precipitation Hourly | PRECIP_HLY |
For this project, we'll be focusing on "Weather Radar (Level II)", which contains three-dimensional reflectivity (precipitation) and radial speed (wind speed) information. Further explanation of how to read and interpret this data can be found on the corresponding wiki
To get this list of datasets, you can run the code below:
## All available datasets
import requests
headers = {
"token": "MY_TOKEN"
}
url = "https://www.ncei.noaa.gov/cdo-web/api/v2/datasets"
response = requests.get(url, headers=headers)
all_datasets = response.json()
print(all_datasets)
stations
This parameter expects a string with a list of all the weather stations we want information from. At the time of writing this, there is a total of 149411 weather stations we can get data from, each with their own sensors and available data. You can get a list of the first 25 with a code just like the one above.
## Some available stations
import requests
headers = {
"token": "MY_TOKEN"
}
url = "https://www.ncei.noaa.gov/cdo-web/api/v2/stations"
response = requests.get(url, headers=headers)
all_stations = response.json()
print(all_stations)
By default, this request is limited to 25 responses, we can (and will) extend it later with while loops. Notice that on the response, each station comes with latitude and longitude information! As well as minimum and maximum dates for which data is available. That'll be crucial of when we're deciding which stations we'll care about.
startdate and enddate
Pretty self explanatory, the starting and ending date we're interested in getting data from, in YYYY-MM-DD format
dataTypes
Now this is the other crucial part of the process, this tells us what kind of data we want to get from that station, in those dates, from that dataset. The possible values for dataTypes can be listed with the code below (at least the first 25).
## All available datasets
import requests
headers = {
"token": "MY_TOKEN"
}
url = "https://www.ncei.noaa.gov/cdo-web/api/v2/datatypes"
response = requests.get(url, headers=headers)
all_datatypes = response.json()
all_datatypes
As of the writing of this, there are 1566 different datatypes we can get, but as you can see from the response, most of them are pretty useless or redundant. NOTICE: Not all stations and not all datasets and dates will have all datatypes available.
format
The format we want to get our data back. json is the best for our purposes, since we can directly interpret them as python dictionaries, but we could also choose csv, or pdf.
includeAttributes
Weather or not to include attributes to the data, explained below on results.
includeStationName
Weather or not to include the name of the station this data was drawn from
includeStationLocation
Weather or not to include the location of the station the data was drawn from
units
What units the data is returned in. Can be metric or standard
Response parameters
metadata
This is just a header from the requests package with some information about the request, nothing important
results
This is an array containing all the results from our query, each result is a dictionary containing:
date: The date relating to the available datadatatype: The type of data in the dictionary (since we can specify many)station: The ID of the station it was gathered from.attributes: Some flags pertaining to the data, may convey warning signals or data quality flagsvalue: the actual value of the data requested. In the case of our example, the first result is:
{'date': '2023-01-01T00:00:00', 'datatype': 'PRCP', 'station': 'GHCND:AE000041196', 'attributes': 'D,,S,', 'value': 0.0}
That is, the precipitation measure by the station "GHCND:AE000041196" on Jan 1st, 2023, is 0.0 and has the flags D,#,S. Further information about the flags can be found here
Accessing more data than the default limit
As mentioned before, the api auto-imposes a limit of 25 responses, to access the rest, one can use a while loop, as shown on the example below, gather all possible [datatypes](#Request Parameters)
import requests
headers = {
"token": "MY_TOKEN"
}
url = "https://www.ncei.noaa.gov/cdo-web/api/v2/datatypes"
offset = 1
chunk_size = 100 # Maximum number of responses expected in each loop. Keep it manageble
all_datatypes = []
while True:
response = requests.get(f"{url}?limit={chunk_size}&offset={offset}", headers=headers)
data = response.json()
# Append results to the list
all_datatypes.extend(data['results'])
# Check if there's more data to fetch
if len(data['results']) < chunk_size:
break
# Update the offset
offset += chunk_size
all_datatypes
This code will gather the next 100 available datatypes until there's no more to be requested.