M2 ICP 1 - PavankumarManchala/BigDataProgrammingICPs GitHub Wiki

Prerequisites:

  1. Intellij Idea 2019.1.1

  2. Install scala tools while installing Intellij Idea 2019.1

a) Create a project in scala and select SBT after that run a sample program

b) Set the hadoop home directory to (Drive_Letter)://winutils in the program

ICP Description and screenshots:

Q1. Spark program on word count with two transformations and two actions.

----> Program: code Here we used transformations parallelize, map, filter, flatmap and in actions we used count, reduce, collect.

Used transformations and actions to write code for word count. Transformations used flatmap, map, sortby and actions used foreach, take.

-----> Input File:

-----> exit code 0 output:

Q2. Secondary Sorting:

-----> Code:

Here we split the input by "," and used the map, after then we used groupbykey, mapvalues in the reducer phase.

----> Input file:

-----> pairs RDD output:

-----> List RDD output:

-----> Output files:

Bonus:

----> Code and Output:

Here we used collection and hash maps for the frequency count of characters.

M2 ICP1 video explanation: https://drive.google.com/open?id=1Q25KaGNybpWHiICeoOAeNKgk9iE-AqXF

     or

https://github.com/PavankumarManchala/BigDataProgrammingICPs/blob/master/Spark/ICP_1/ICP1.mp4

All ICPs videos link: https://drive.google.com/open?id=1racqWkfI10T-CpLYEDYCvJRSRhhLGsWL