LAB 2 Report - SAISRIHARSHAS/Big-Data-Analytics-and-Applications-CS5542 GitHub Wiki

Created a Spark program with an interesting use case which performs a series of computation using MapReduce paradigm.

lab 2 report pdf: 2 Report

MAP REDUCE PROCESS:

Working Model

Used 4 Spark Transformations

flatMap map join sortByKey Used 2 Spark Actions reduceByKey saveAsTextFile The objective of the use case is to combine two values in a single key and perform computations on both the keys and multiple values associated with them. Basically, the map is used to associate each key with their values and the flatMap is used to flatten the text. join is used to associate values with the existing value in the key-value pair.sortByKey is used to sort the keys in Ascending order.

Note: To use Spark transformations and actions, include Spark library and manage dependencies in build.sbt file inside the project folder. IntelliJ IDEA is best suited for adding library files and managing dependencies. Also scala 2.12 does not support spark 1.6

INPUT DATA: OUTPUT DATA: