ICP_11 - PallaviArikatla/Big-Data-Programming GitHub Wiki
INTRODUCTION: Working on Apache Spark Streaming.
IMPLEMENTATION:
Question 1: Spark Streaming using log file generator.
- First, run file.py file.
- This program will create log files in the specified location.
- Here it creates 30 log files one after the other with a time interval of 5 seconds and the content from the file lorem.txt is stored in these log files.
- Then, simultaneously run streaming.py file. This program will perform word count on the log file created.
OUTPUT:
- The following is the screenshot of the log files being created.
- Later wordcount in a log file will be created and gets displayed.
Question 2: Spark word count program of Spark Streaming received from a data server listening on a TCP socket.
- Here, first we are creating streaming context with 2 threads, batch interval 5.
- A dstream is created which connects to hostname:port number.
- Here, lines are divided into words and then word count is performed in each batch and the corresponding output is printed in the terminal.
- First we will execute the python file wordcount.py and then simultaneously in the command prompt we should run the command "nc -l -p port number" and then we need to give input.
- Now the word count will be performed on the input which we enter and the corresponding output will be displayed.
OUTPUT:
- The following is the input given.
- Screenshots of the outputs that contain wordcount of input which are manually entered.
Bonus Question: Spark Streaming for Character Frequency using TCP Socket.
- Here, first we are creating streaming context with 2 threads, batch interval 5.
- A stream is created which connects to hostname:port number.
- Then it divides each line into a set of words and then calculates the word length of each word in each batch and the corresponding output is printed in the terminal.
- First we will execute the python file characterfreq.py and simultaneously in the command prompt we should run the command "nc -l -p port number".
- Then we need to give input.
OUTPUT: