Weather Data Importer - OpenData-tu/documentation GitHub Wiki
Version | Date | Modified by | Summary of changes |
---|---|---|---|
0.1 | 2017-06-17 | Rohullah & Jawid | Initial version |
0.1a | 2017-06-20 | Rohullah & Jawid | Batch Job 'Steps' Part edited |
0.1b | 2017-07-20 | Rohullah & Jawid | JSON Schema added |
WeatherDataImporter is part of our extensible ETL framework. In this importer, a specific data source is used to extract data from, process them and write them into the Kafka queue.
Components Basic Components for Weather Data Importer and Producer are the followings:
Weather Data for Berlin, Germany The data source which is used for this importer is from luftdaten.info. This datasource is part of the OK Lab Stuttgart project which includes 300 fine dust sensors.
It’s possible to filter different data such as:
- Humidity
- Temperature
- Pressure
with their locations and timestamps.
Supported formates: CSV
URLs:
This importer is Single 'Job' with four 'Steps' which is implemented in Spring Batch framework.
a single step for a single job in Spring Batch is contained: Read, Process, Write
- Read: read data from the source
- Process: in our case we have processed to change the date into ISO format
- Write: the importer write the data into our predefined JSON schema
Note: we used four steps because in current data source there are sensors with different types. Each sensor type requires a schema to be created, each in an individual 'Steps'. Thus we have four steps in our Batch Job.
this feature of 'Spring' is used in our case to make the whole data import framework as micro services.
- Schema Validation: in order to write the extracted data into the queue (Kafka), the importer validates the data into our required data format with the fields of:
- sensorId
- sensorType
- location
- lat
- lon
- timestamp
- pressure
- altitude
- pressureSeaLevel
- temperature
- humidity
as already documented in: Open Data APIs for Input Data
Sample JSON Schema for Measurement Units:
{
"sourceId": "luftdaten_info",
"device": "2036",
"timestamp": "2017-07-01T22:40:05.000+02:00",
"location": {
"lat": 42.657,
"lon": 23.349
},
"license": "find out",
"sensors": {
"pressure": {
"sensor": "BME280",
"observation_value": 94267.06
},
"altitude": {
"sensor": "BME280",
"observation_value": null
},
"pressure_sealevel": {
"sensor": "BME280",
"observation_value": null
},
"temperature": {
"sensor": "BME280",
"observation_value": 26.19
},
"humidity": {
"sensor": "BME280",
"observation_value": 49.8
}
},
"extra": {
"location": "1023"
}
Kafka Producer
- produces the extracted data into Kafka queue with a specified JSON schema
- Kafka Configuration: basic properties for Kafka producer such as 'broker', 'serializer', key and value
- Queue Topic: setting a topic with a specific name for Kafka queue