Lab 7 & 8 Kafka Storm Communication Demo - meetsriharsha/RTBDA_5543 GitHub Wiki

In this tutorial, we aimed to establish the connection between Storm and Kafka.

Basically, we are extracting the main frames out of the video and trying to pass them to Storm through Kafka. Storm upon receiving the key frames it performs basic analytics like counting the number of frames, number of sift features for a particular stream of video etc. This stream of video will be coming from Kafka Spout. As we are trying to implement a distributed architecture, we have created several Kafka Bolts to perform analytics. Each bolt that's been created is responsible for each analytical operation at Storm. Upon performing the analytics, these results are being stored to MongoDB.

Storm: Storm is a computational distributed framework for stream processing and to perform analytics on the data using Bolts and Spouts. It is developed in Java and Clojure. Basically it achieves its distributed processing through Spouts and bolts. These Spouts and Bolts can be represented as a Topology (Graph like structure) with Spouts and Bolts as its vertices. This general topology structure is similar to a MapReduce job, with the main difference being that data is processed in real time as opposed to in individual batches and in Storm, the process runs indefinitely until its killed which is not the case in MapReduce paradigm.

Kafka: Its a message broker which is helpful in the transfer of data from client to the Apache Storm.

MongoDB: A NoSQL database, that avoids traditional table-based relational database structure. It favors the format typed called BSON which is a dynamic schema unlike JSON.

##Output:

Storm UI:

Storm UI

Storm Topology: Storm Topology

Topology Visualization:

Topology Visualization

Demo:

Communication between Kafka and Storm