Lab 2 Spark Transformations and Actions - meetsriharsha/RTBDA_5543 GitHub Wiki
Welcome to the RTBDA_5543 wiki!
09/06/2016
The main goal of this lab is to understanding the usage of Apache Spark Transformations and Actions.
I've created a spark program in scala language to demonstrate the various transformations and actions on RDDs. First I've created a sample data in a comma separated text file.
filter()
Transformation:
I've applied the filter()
transformation on the input file to get only the results having "New York".
map()
Transformation:
I've applied the map()
transformation to map the resulting file into the RDDs having only name and count fields.
reduceByKey()
Transformation:
In the next step I've applied the reduceByKey()
transformation to get the total count for each name.
sortByKey()
Transformation:
The resulting dataset can be sorted by key values using the sortByKey()
transformation method.
saveAsTextFile()
Action:
The results can be saved to a text file using the saveAsTextFile()
method. The files will be created in hadoop file system manner.
collect()
and foreach()
Actions:
collect()
returns the dataset as an array at the driver program. Using foreach()
we can run a function on each element of the dataset. I've used these two action methods to display the final results to the console.
Output:
Map-Reduce Diagram:
IntelliJ Output
Input and Output Files: