ICP 11 - Nagu34/Big-data-programming GitHub Wiki

                                             welcome to ICP-11

INTRODUCTION

Data Streaming

Data Streaming is a technique for transferring data so that it can be processed as a steady and continuous stream. Streaming technologies are becoming increasingly important with the growth of the Internet.

Data Streaming Features

  1. Scaling: Spark Streaming can easily scale to hundreds of nodes.
  2. Speed: It achieves low latency.
  3. Fault Tolerance: Spark has the ability to efficiently recover from failures.
  4. Integration: Spark integrates with batch and real-time processing.
  5. Business Analysis: Spark Streaming is used to track the behavior of customers which can be used in business analysis

InClass Exercise

1.Spark Streaming using Log File Generator:

Spark Streaming using log file generator. Use the instructions in the slides

2.Spark Streaming for TCP Socket:

Write a spark word count program of Spark Streaming received from a data server listening on a TCP socket. Hint: For Netcat utility in Windows https://github.com/rsanchez-wsu/jfiles/wiki/Windows-10-Telnet-&-NetCat