Module 2: ICP #4 - VidyullathaKaza/BigData_Programming_Spring2020 GitHub Wiki

Spark Streaming and Data Analysis

Part 1 : Spark Streaming using Log File Generator

Write the text files into log directory subsequently getting streamed by the other process.

code to generate the log file creation. There will be 5 seconds delay between creation of files.

Input : The input is given in console using netstat on port 9999

Output : We can see the word count

Part 2 : Spark Streaming for TCP Socket: Code for listening using TCP socket the words and the word count logic is implemented

Input : The input is given in console using netstat in TCP port 9999

Output : The output shows that the input words are streamed from console and is counted

Bonus : Spark Streaming for Character Frequency using TCP Socket.

Code

Input : The input is given in console using netstat in TCP port 9999

Output : The output shows that the input words are streamed from console and is counted