Amazon Open Data - projecthorus/sondehub-infra GitHub Wiki
Sondehub.org aggregates telemetry data that is uploaded from community run radiosonde receiver stations. Weather balloons are launched from numerous weather organisations around the world, with the data used to build weather models such as the GFS and ECMWF. The goal of SondeHub is collect the community-sourced data in a central location to allow organisations to develop their own models and forecasting, and to assist hobbyists in the recovery of the radiosondes themselves (which are otherwise discarded).
Weather balloons are typically launched to coincide with the common model cycles of 00:00Z, 06:00Z, 12:00Z, and 18:00Z, with the most common launches occurring for 00Z and 12Z. The launches generally occur ~45 minutes prior to the model time, e.g. 11:15Z for a 12Z model. Some stations will launch balloons on-demand to capture data on unusual weather systems, or to assist with fire-weather forecasting. These launches typically fly radiosondes which report timestamped data including positional information (latitude / longitude / altitude), wind speed/direction, temperature, humidity, and pressure. Altitude of 25-30km may be reached.
In addition to regular radiosonde launches, some stations launch additional sensors, the most common of which are Ozone sensors. These launches generally occur less regularly due to the higher cost of these sensors.
Data Access
Data is stored Amazon S3 and can be access by using AWS S3 tools. Alternatively basic access is available using our Python SDK. Sonde data is uploaded as soon as it is received by the contributing station.
SDK
Install SDK
pip3 install sondehub # todo add examples
Example
import sondehub
frames = sondehub.download(serial="serial", datetime_prefix="2018-10-01")
Amazon
Since data is stored in Amazon it can be downloaded quickly from an EC2 instance rather than downloading the data to your local machine. You can utilise Amazons SDKs or CLI to do this.
CLI Example
aws s3 cp --no-sign-request s3://sondehub-history/serial/{serial}.json.gz /tmp/sonde_data
Data Types and Structure
Frames are collated then uploaded as a single JSON gzip file in the S3 bucket by a batch process. They are uploaded in Universal Sonde Telemetry Format and are indexed by datetime. No filter, or modification of the frames has occurred at this point and it's the users responsibility to check all required fields are acceptable for your use case.
Some important notes:
- data hasn't be normalised in anyway
- SondehubV1 API is forward to SondehubV2 however only has a subset of available fields available. This can be filtered out by using checking for
SondehubV1
in thesoftware_name
field - All data prior to 2021 has been imported from SondehubV1
- ⚠️ The data provided by most decoders should be considered to be uncalibrated, due to limited information on sensor calibration.
Data Types
Universal Sonde Telemetry Format provides a JSON object per frame.
{
"subtype": "SondehubV1",
"temp": "-61.2",
"manufacturer": "SondehubV1",
"serial": "S4640152",
"lat": "44.20318",
"frame": "6147",
"datetime": "2021-02-04T00:32:30.157239Z",
"software_name": "SondehubV1",
"humidity": "1.9",
"alt": "22010",
"vel_h": "6.5",
"uploader_callsign": "F4ERG",
"lon": "-2.50013",
"software_version": "SondehubV1",
"type": "SondehubV1",
"time_received": "2021-02-04T00:32:30.157239Z",
"position": "44.20318,-2.50013"
}
Data Structure
The S3 bucket sondehub-history
is partitioned into /date/
(summary data), /launchsites/
(summary data) and /serial/{serial}.json.gz
. S3 might not contain the latest data and it may take up to 24 hour to become uploaded.
Summary data contains the first, highest and last frame of a flight.
XDATA
xdata field can be decoded based on the vendors specifications. For more details checkout NOAA's XDATA specification.