ICP 11 - Nagu34/Big-data-programming GitHub Wiki
welcome to ICP-11
INTRODUCTION
Data Streaming
Data Streaming is a technique for transferring data so that it can be processed as a steady and continuous stream. Streaming technologies are becoming increasingly important with the growth of the Internet.
Data Streaming Features
- Scaling: Spark Streaming can easily scale to hundreds of nodes.
- Speed: It achieves low latency.
- Fault Tolerance: Spark has the ability to efficiently recover from failures.
- Integration: Spark integrates with batch and real-time processing.
- Business Analysis: Spark Streaming is used to track the behavior of customers which can be used in business analysis
InClass Exercise
1.Spark Streaming using Log File Generator:
Spark Streaming using log file generator. Use the instructions in the slides
2.Spark Streaming for TCP Socket:
Write a spark word count program of Spark Streaming received from a data server listening on a TCP socket. Hint: For Netcat utility in Windows https://github.com/rsanchez-wsu/jfiles/wiki/Windows-10-Telnet-&-NetCat