ICP2_4 - Hiresh12/Big-Data-Programming GitHub Wiki
Apache SparkStreaming
Task:
To write a spark word count program of Spark Streaming received from a data server listening on a TCP socket.
Features:
- Spark
- python
- Jupiter Notebook
- DStream
- Spark Streaming
- Socket
Tasks:
Part –1:
1.Spark Streaming using Log File Generator:
Log Files : https://github.com/Hiresh12/Big-Data-Programming/tree/master/ICP11/log
Spark Streamingfor TCP Socket:
Bonus
References
https://stackoverflow.com/questions/51689460/select-specific-columns-from-spark-dataframe
https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-aggregate-functions.html