In Class Programming 3 - sirisha1206/Spark GitHub Wiki

Name:Naga Sirisha Sunkara

Class ID:21

Team ID:5

Technical partners details:

Name:Vinay Santhosham

Class ID:17

Source Code:code link

Video Link:video

Objective:

The objective of the inclass programming is to implement the matrix matrix multiplication and run the map reduce job in hadoop.

We have written the map reduce jobs in python.

Steps to be followed:

push the input text file to hdfs:

hdfs dfs -copyFromLocal input.txt input.txt

Checking whether the input the input file is pushed in hdfs:

hdfs dfs -ls /

Command to run map reduce in python

hadoop jar /usr/local/hadoop-2.8.1/share/hadoop/tools/lib/hadoop-streaming-2.8.1.jar -input /input.txt -output /matrixoutput -mapper /home/hdsirisha/Desktop/icp3/mapper.py -reducer /home/hdsirisha/Desktop/icp3/reducer.py

Checking the output directory:

hdfs dfs -ls /matrixoutput

View the output of the map reduce job:

hdfs dfs -cat /matrixoutput/part-00000

Algorithm:

Map: for each element in A:

generate key (row,k),col,value

k:columns of n

for each element in B:

generate key(k,col),row,value

k:rows of n

Reduce:

Sort the values of A and B.

Multiply Aij * Bjk

Sum of Aij * Bjk with change of j

Input File:

Mapper:

Reducer:

Output: