ICP_11 - PallaviArikatla/Big-Data-Programming GitHub Wiki

INTRODUCTION: Working on Apache Spark Streaming.

IMPLEMENTATION:

Question 1: Spark Streaming using log file generator.

  • First, run file.py file.
  • This program will create log files in the specified location.
  • Here it creates 30 log files one after the other with a time interval of 5 seconds and the content from the file lorem.txt is stored in these log files.
  • Then, simultaneously run streaming.py file. This program will perform word count on the log file created.

OUTPUT:

  • The following is the screenshot of the log files being created.

  • Later wordcount in a log file will be created and gets displayed.

Question 2: Spark word count program of Spark Streaming received from a data server listening on a TCP socket.

  • Here, first we are creating streaming context with 2 threads, batch interval 5.
  • A dstream is created which connects to hostname:port number.
  • Here, lines are divided into words and then word count is performed in each batch and the corresponding output is printed in the terminal.
  • First we will execute the python file wordcount.py and then simultaneously in the command prompt we should run the command "nc -l -p port number" and then we need to give input.
  • Now the word count will be performed on the input which we enter and the corresponding output will be displayed.

OUTPUT:

  • The following is the input given.

  • Screenshots of the outputs that contain wordcount of input which are manually entered.

Bonus Question: Spark Streaming for Character Frequency using TCP Socket.

  • Here, first we are creating streaming context with 2 threads, batch interval 5.
  • A stream is created which connects to hostname:port number.
  • Then it divides each line into a set of words and then calculates the word length of each word in each batch and the corresponding output is printed in the terminal.
  • First we will execute the python file characterfreq.py and simultaneously in the command prompt we should run the command "nc -l -p port number".
  • Then we need to give input.

OUTPUT: