Spark ICP4 - neerajpadarthi/Big-Data-Programming GitHub Wiki
Name : Neeraj Padarthi
Class ID: 19
Spark ICP : 4
Objective
- To do spark streaming using Log File Generator
- To do spark streaming for TCP Socket
Apache Spark Streaming
- Spark Streaming is done using log file generator
- Spark Streaming is done by receiving from a data server listening on a TCP socket
- Spark Streaming is an extension of the core Spark API
- It enables scalable, high-throughput, fault-tolerant stream processing of live data streams
- Spark Streaming provides a high-level abstraction called discretized stream or DStream
- It represents a continuous stream of data.
Log File Generator
Input File

Writing LOGs



Reading Files - File streaming


NetCat/TCP
Input Socket - NETCAT

Input Socket - TCP

Output - NETCAT


Output - TCP

Bonus
