Lab2 - pamidiaparna/RealTimeBigData GitHub Wiki

LAB2: Spark Programming with Transformations and Actions

The text file is taken as input and the file is processed with two transformations and two Actions namely

Text File Format Fields: Names, County, Sex

UseCase: Processing the text file to find the most common names used and also the count of each name used in multiple records.

Transformations:

  1. Map
  2. ReduceByKey
  3. Sortby

Actions:

  1. saveAsTextFile
  2. count

The sequence of execution is as follows:

  1. A Map transformation is applied on the input text file to split the text using "," as delimiter.

  2. Applying a ReducebyKey transformation to add up the all Names which are repeated on the outcome received from the map function to know the how often a particular name is used and then we are sorting the result of reducebykey transformation.

  3. The sorted result are stored as text file as a final output.

  4. And we are also counting the total number of distinct names available in the input file.

Link to the screenshots:

The below screenshot contains map reduce diagram that depicts the transformations and action applied before the final result https://www.dropbox.com/s/jjx6b6my1ak2yfo/lab2image.PNG?dl=0

Output Screenshots:

https://www.dropbox.com/s/qdyh6904mbt82vh/ACapture.PNG?dl=0

https://www.dropbox.com/s/y08aejyqmqiz3a8/lab2.PNG?dl=0