Running a full Pipeline - OpenData-tu/documentation GitHub Wiki

Version Date Modified by Summary of changes
0.3 2017-07-02 Oliver Bruski extend Kafka Part
0.2 2017-06-18 Rohullah & Jawid describing the data importer in detail
0.1 2017-06-15 Nico Tasche initial Version

Just a starter:

Importer

Weather Data Importer + Kafka Data Producer Project

  • WeatherDataImporter + Data Producer for Kafka Queue is part of our extensible ETL framework.
  • Weather Data for Berlin, Germany (OK Lab Stuttgart project) is used as a data source.
    • Possible Measurements:
      • Humidity
      • Temperature
      • Pressure
  • Spring Cloud Task: a single 'Job' and single 'Step'
  • Data model: the importer converts the data into our required data format as already documented here
  • Producer for Kafka Queue: writes the data into Kafka queue with a specific topic.

Kafka

Start kafka and zookeeper

Run an instance on Kafka on a server with an attached persistant storage. Each importer(in Kafka terms producer) should create a topic for his source, if it does not already exist. So here we have a 1-to-1 mapping of data source and topic. Now we can create a Consumer that listens to a topic and reads the data from his assigned topic/source. For future development several consumers can read from a topic and write into the DB. Here we have to be aware of the synchronisation overhead not to create duplicates. This can be avoided using a predefined pattern for the id and a PUT request that will overwrite the data.

This gives us a mapping of 1-to-1-to-1 (source/importer-to-topic-to-consumer)!

Consumer

Project: kafkaToElasticsearchConsumer

Elasticsearch

make docker stuff

Access Server

TODO: write a shell script