Umweltbundesamt Air Quality Data Importer - OpenData-tu/documentation GitHub Wiki
Authorship
Version | Date | Modified by | Summary of changes |
---|---|---|---|
0.1 | 2017-07-12 | Rohullah & Jawid | Initial version |
0.2 | 2017-07-13 | Rohullah & Jawid | JSON schema added + Formating |
Umweltbundesamt Air Quality Data Importer
Umweltbundesamt importer is developed as part of the extensible ETL framework. This data importer reads data from the specific source with the format of Comma Separated Values, process the data into our own schema then writes these data into Kafka queue.
Importer Components: this importer is composite of different components as follow:
Data source
The datasource used for this importer is about air quality data (Current Air Data) in a daily basis from about 500 data measuring stations of the federal states and the Federal Environmental Agency in over 16 states and cities throughout Germany.
States
- Brandenburg
- Berlin
- Hesse
- Hamburg
- Baden-Wurttemberg
- Bavaria
- Bremen
- Mecklenburg-Vorpommern
- Lower Saxony
- North Rhine-Westphalia
- Rheinland-Pfalz
- Saarlan
- Schleswig-Holstein
- Saxony
- Saxony-Anhalt
- Thuringia
- UBA
Data Measuring Stations:
- there are about 500 data measuring stations for all above states.
- Every station has a unique station code.
Note: the geographic location for data measuring stations were not provided by the source. We managed to map all the stations with their particular geographic coordinates through Java classes manually.
Supported Data Format: CSV
Measurement Units:
- Fine Dust (PM10)
- Sulfur Dioxide
- Ozone
- Nitrogen Dioxide
- Carbon Monoxide
Note:
- all measurement units are in (µg/m³).
- Max (8h): means 8th hour maximum value of measurement units.
URL: https://www.umweltbundesamt.de/
Batch Jobs
This importer is Single 'Job' with five 'Steps' which is implemented in Spring Batch framework.
- Spring Batch 'Steps'
a single step for a single job in Spring Batch is contained: Read, Process, Write
Read: read data from the source Process: in our case we have processed to change the date into ISO format Write: the importer write the data into our predefined JSON schema Note: Every measurement unit had their own schema, because of that we used five steps for the every individual five measurement units. Thus we have five steps in our Batch Job.
- Spring Cloud Task
this feature of 'Spring' is used in our case to make the whole data import framework as micro services.
Data Model
in order to write the extracted data into the queue (Kafka), this importer generates the required data format with the fields of:
sensor_id
device
timestamp
location
sensors
extras
[height]
license
as already documented in: Open Data APIs for Input Data
Sample Json Schema for Fine Dust (PM10) Measurement Unit
{
"source_id": "umweltbundesamt_de",
"device": "Elsterwerda",
"timestamp": "2017-07-12T22:00:00Z",
"location": {
"lat": 51.462734,
"lon": 13.526796
},
"license": "find out",
"sensors": {
"PM10DailyAverage": {
"sensor": "Particles PM10",
"observation_value": 11.0
}
},
"extra": {
"pollutant": "Feinstaub (PM10)",
"network": "BB",
"dataType": "Tagesmittel (1TMW)",
"stationCode": "DEBB007"
}
}
Other measurements
"sensors": {
"SO2DailyAverage": {
"sensor": "Sulfur Dioxide",
"observation_value": 1.0
}
"sensors": {
"O3Max8hAverage": {
"sensor": "Ozone",
"observation_value": 72.0
}
"sensors": {
"NO2Max1hAverage": {
"sensor": "Nitrogen Dioxide",
"observation_value": 19.0
}
"sensors": {
"COMax8hAverage": {
"sensor": "Carbon Monoxide",
"observation_value": 300.0
}
Data Producer for Kafka Queue
Kafka Producer
- produces the extracted data into Kafka queue with a specified JSON schema
- Kafka Configuration: basic properties for Kafka producer such as 'broker', 'serializer', key and value
- Queue Topic: setting a topic with a specific name for Kafka queue