ICP 4 Module 2 - Gnkhakimova/CS5590-BigData GitHub Wiki

ICP 4

Spark Streaming

Source code

Tasks

  1. Spark Streaming using log file generator.
  2. Write a spark word count program of Spark Streaming received from a data server listening on a TCP socket

Configuration

  • Linux Mint
  • IntelliJ
  • Apache Spark
  • NC

Features

In this ICP 4 I used IntelliJ IDE to complete task, we had to use Spark Streaming in order to perform word count.

Task 1

For following task we had to write two classes, first for generating log files from given text file and second one is to read stream of logs and preform word count.
Bellow is an example of log file generation, which reads text file line by line and generated log file to desired "log" folder.

Next is an example of stream log reader and word counting.

Output:

First we need to run stream reader class and then log generator class, so stream reader can read files while they are getting generated.

Task 2

We had to perform word count of a text received from server.

Input:

Limitations

  • Had to do a research on how to generate log files in Scala.

References

  1. https://spark.apache.org/docs/latest/streaming-programming-guide.html
  2. https://www.dataneb.com/post/streamingcontext-spark-streaming-word-count-example-scala