Lab 2 Spark Transformations and Actions - meetsriharsha/RTBDA_5543 GitHub Wiki
Welcome to the RTBDA_5543 wiki!
09/06/2016
The main goal of this lab is to understanding the usage of Apache Spark Transformations and Actions.
I've created a spark program in scala language to demonstrate the various transformations and actions on RDDs. First I've created a sample data in a comma separated text file.
filter() Transformation:
I've applied the filter() transformation on the input file to get only the results having "New York".
map() Transformation:
I've applied the map() transformation to map the resulting file into the RDDs having only name and count fields.
reduceByKey() Transformation:
In the next step I've applied the reduceByKey() transformation to get the total count for each name.
sortByKey() Transformation:
The resulting dataset can be sorted by key values using the sortByKey() transformation method.
saveAsTextFile() Action:
The results can be saved to a text file using the saveAsTextFile() method. The files will be created in hadoop file system manner.
collect() and foreach() Actions:
collect() returns the dataset as an array at the driver program. Using foreach() we can run a function on each element of the dataset. I've used these two action methods to display the final results to the console.
Output:
Map-Reduce Diagram:
IntelliJ Output
Input and Output Files:
