Running a full Pipeline - OpenData-tu/documentation GitHub Wiki

Version	Date	Modified by	Summary of changes
0.3	2017-07-02	Oliver Bruski	extend Kafka Part
0.2	2017-06-18	Rohullah & Jawid	describing the data importer in detail
0.1	2017-06-15	Nico Tasche	initial Version

Just a starter:

Importer

Weather Data Importer + Kafka Data Producer Project

WeatherDataImporter + Data Producer for Kafka Queue is part of our extensible ETL framework.
Weather Data for Berlin, Germany (OK Lab Stuttgart project) is used as a data source.
- Possible Measurements:
  - Humidity
  - Temperature
  - Pressure
Spring Cloud Task: a single 'Job' and single 'Step'
Data model: the importer converts the data into our required data format as already documented here
Producer for Kafka Queue: writes the data into Kafka queue with a specific topic.

Kafka

Start kafka and zookeeper

Run an instance on Kafka on a server with an attached persistant storage. Each importer(in Kafka terms producer) should create a topic for his source, if it does not already exist. So here we have a 1-to-1 mapping of data source and topic. Now we can create a Consumer that listens to a topic and reads the data from his assigned topic/source. For future development several consumers can read from a topic and write into the DB. Here we have to be aware of the synchronisation overhead not to create duplicates. This can be avoided using a predefined pattern for the id and a PUT request that will overwrite the data.

This gives us a mapping of 1-to-1-to-1 (source/importer-to-topic-to-consumer)!

Consumer

Project: kafkaToElasticsearchConsumer

Elasticsearch

make docker stuff

Access Server

TODO: write a shell script