Spark ICP4 - neerajpadarthi/Big-Data-Programming GitHub Wiki

Name : Neeraj Padarthi

Class ID: 19

Spark ICP : 4

Objective

  • To do spark streaming using Log File Generator
  • To do spark streaming for TCP Socket

Apache Spark Streaming

  • Spark Streaming is done using log file generator
  • Spark Streaming is done by receiving from a data server listening on a TCP socket
  • Spark Streaming is an extension of the core Spark API
  • It enables scalable, high-throughput, fault-tolerant stream processing of live data streams
  • Spark Streaming provides a high-level abstraction called discretized stream or DStream
  • It represents a continuous stream of data.

Log File Generator

Input File

Writing LOGs

Reading Files - File streaming

NetCat/TCP

Input Socket - NETCAT

Input Socket - TCP

Output - NETCAT

Output - TCP

Bonus