SPARK ICP 4 - Apoorvag2597/BDP_Revised GitHub Wiki
Name - Apoorva Geetanjali Avadhanula
Class ID - 34
- To create a Spark Streaming using Log File Generator.
Tools used - Pycharm Approach Used
To perform Spark Streaming for TCP Socket - To write a spark word count program of Spark Streaming received from a data server listening on a TCP socket.
- For a spark streaming file to be created, a file.py is used to generate the log files. lorem.txt is used in this case for which the log files are generated. The text is loaded from this text file. Output- Firstly, log files are generated.
3.Streaming data, Word count is displayed.
In this, the word count operation is performed under a respective host. It is done by using the command 'ns -l -p port number' to execute the wordcount under a localhost using the netcat. The wordcount operation is done like it usually does. Input given in the command prompt -
Output
- Character Frequency - This is similar to the previous part of this ICP, the only difference is that in the first part we count the words, but here we count the individual characters.
Output -