Lab2 - pamidiaparna/RealTimeBigData GitHub Wiki
LAB2: Spark Programming with Transformations and Actions
The text file is taken as input and the file is processed with two transformations and two Actions namely
Text File Format Fields: Names, County, Sex
UseCase: Processing the text file to find the most common names used and also the count of each name used in multiple records.
Transformations:
- Map
- ReduceByKey
- Sortby
Actions:
- saveAsTextFile
- count
The sequence of execution is as follows:
-
A Map transformation is applied on the input text file to split the text using "," as delimiter.
-
Applying a ReducebyKey transformation to add up the all Names which are repeated on the outcome received from the map function to know the how often a particular name is used and then we are sorting the result of reducebykey transformation.
-
The sorted result are stored as text file as a final output.
-
And we are also counting the total number of distinct names available in the input file.
Link to the screenshots:
The below screenshot contains map reduce diagram that depicts the transformations and action applied before the final result https://www.dropbox.com/s/jjx6b6my1ak2yfo/lab2image.PNG?dl=0
Output Screenshots: