Running a full Pipeline - OpenData-tu/documentation GitHub Wiki
Version | Date | Modified by | Summary of changes |
---|---|---|---|
0.3 | 2017-07-02 | Oliver Bruski | extend Kafka Part |
0.2 | 2017-06-18 | Rohullah & Jawid | describing the data importer in detail |
0.1 | 2017-06-15 | Nico Tasche | initial Version |
Just a starter:
Importer
Weather Data Importer + Kafka Data Producer Project
- WeatherDataImporter + Data Producer for Kafka Queue is part of our extensible ETL framework.
- Weather Data for Berlin, Germany (OK Lab Stuttgart project) is used as a data source.
- Possible Measurements:
- Humidity
- Temperature
- Pressure
- Possible Measurements:
- Spring Cloud Task: a single 'Job' and single 'Step'
- Data model: the importer converts the data into our required data format as already documented here
- Producer for Kafka Queue: writes the data into Kafka queue with a specific topic.
Kafka
Start kafka and zookeeper
Run an instance on Kafka on a server with an attached persistant storage. Each importer(in Kafka terms producer) should create a topic for his source, if it does not already exist. So here we have a 1-to-1 mapping of data source and topic. Now we can create a Consumer that listens to a topic and reads the data from his assigned topic/source. For future development several consumers can read from a topic and write into the DB. Here we have to be aware of the synchronisation overhead not to create duplicates. This can be avoided using a predefined pattern for the id and a PUT request that will overwrite the data.
This gives us a mapping of 1-to-1-to-1 (source/importer-to-topic-to-consumer)!
Consumer
Project: kafkaToElasticsearchConsumer
Elasticsearch
make docker stuff
Access Server
TODO: write a shell script