M2 ICP 1 - PavankumarManchala/BigDataProgrammingICPs GitHub Wiki

a) Create a project in scala and select SBT after that run a sample program

b) Set the hadoop home directory to (Drive_Letter)://winutils in the program

Q1. Spark program on word count with two transformations and two actions.

----> Program: code Here we used transformations parallelize, map, filter, flatmap and in actions we used count, reduce, collect.

Used transformations and actions to write code for word count. Transformations used flatmap, map, sortby and actions used foreach, take.

-----> Input File:

-----> exit code 0 output:

Q2. Secondary Sorting:

-----> Code:

Here we split the input by "," and used the map, after then we used groupbykey, mapvalues in the reducer phase.

----> Input file:

-----> pairs RDD output:

-----> List RDD output:

-----> Output files:

Bonus:

----> Code and Output:

Here we used collection and hash maps for the frequency count of characters.

M2 ICP1 video explanation: https://drive.google.com/open?id=1Q25KaGNybpWHiICeoOAeNKgk9iE-AqXF

or

All ICPs videos link: https://drive.google.com/open?id=1racqWkfI10T-CpLYEDYCvJRSRhhLGsWL