ICP 4 module II - gracesyl/big-data-hadoop GitHub Wiki
Data Streaming: Data Streaming is a technique for transferring data so that it can be processed as a steady and continuous stream. Streaming technologies are becoming increasingly important with the growth of the Internet.
Data Streaming Features: Scaling: Spark Streaming can easily scale to hundreds of nodes. Speed: It achieves low latency. Fault Tolerance: Spark has the ability to efficiently recover from failures. Integration: Spark integrates with batch and real-time processing. Business Analysis: Spark Streaming is used to track the behavior of customers which can be used in business analysis InClass Exercise: 1.Spark Streaming using Log File Generator:
Spark Streaming using log file generator. Use the instructions in the slides.
Loggenerator I/P:
Loggenerator O/P:
Sparkstreaming using the generated logfiles in log directory:
O/p:
2.Spark Streaming for TCP Socket:
Write a spark word count program of Spark Streaming received from a data server listening on a TCP socket. Hint: For Netcat utility in Windows https://github.com/rsanchez-wsu/jfiles/wiki/Windows-10-Telnet-&-NetCat
Localhost streaming i/p:
O/p:
Limitations:
pycharm directory setting was difficult.
Video Link :https://drive.google.com/file/d/1RnVFN8VuP_PdSR3Y2iv7DBlbEmCiOZXP/view?usp=sharing