ICP2_4 - Hiresh12/Big-Data-Programming GitHub Wiki

Apache SparkStreaming

Task:

To write a spark word count program of Spark Streaming received from a data server listening on a TCP socket.

Features:

  • Spark
  • python
  • Jupiter Notebook
  • DStream
  • Spark Streaming
  • Socket

Tasks:

Part –1:

1.Spark Streaming using Log File Generator:

Log Files : https://github.com/Hiresh12/Big-Data-Programming/tree/master/ICP11/log

Spark Streamingfor TCP Socket:

Bonus

References

https://stackoverflow.com/questions/51689460/select-specific-columns-from-spark-dataframe

https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-aggregate-functions.html