Lab 2 Spark Transformations and Actions - meetsriharsha/RTBDA_5543 GitHub Wiki

Welcome to the RTBDA_5543 wiki!

09/06/2016

The main goal of this lab is to understanding the usage of Apache Spark Transformations and Actions.

I've created a spark program in scala language to demonstrate the various transformations and actions on RDDs. First I've created a sample data in a comma separated text file.

filter() Transformation:

I've applied the filter() transformation on the input file to get only the results having "New York".

map() Transformation:

I've applied the map() transformation to map the resulting file into the RDDs having only name and count fields.

reduceByKey() Transformation:

In the next step I've applied the reduceByKey() transformation to get the total count for each name.

sortByKey() Transformation:

The resulting dataset can be sorted by key values using the sortByKey() transformation method.

saveAsTextFile() Action:

The results can be saved to a text file using the saveAsTextFile() method. The files will be created in hadoop file system manner.

collect() and foreach() Actions:

collect() returns the dataset as an array at the driver program. Using foreach() we can run a function on each element of the dataset. I've used these two action methods to display the final results to the console.

Output:

Map-Reduce Diagram: Diagram IntelliJ Output IntelliJOutput Input and Output Files: Input and Output