In Class Programming 3 - sirisha1206/Spark GitHub Wiki
Name:Naga Sirisha Sunkara
Class ID:21
Team ID:5
Technical partners details:
Name:Vinay Santhosham
Class ID:17
Source Code:code link
Video Link:video
Objective:
The objective of the inclass programming is to implement the matrix matrix multiplication and run the map reduce job in hadoop.
We have written the map reduce jobs in python.
Steps to be followed:
push the input text file to hdfs:
hdfs dfs -copyFromLocal input.txt input.txt
Checking whether the input the input file is pushed in hdfs:
hdfs dfs -ls /
Command to run map reduce in python
hadoop jar /usr/local/hadoop-2.8.1/share/hadoop/tools/lib/hadoop-streaming-2.8.1.jar -input /input.txt -output /matrixoutput -mapper /home/hdsirisha/Desktop/icp3/mapper.py -reducer /home/hdsirisha/Desktop/icp3/reducer.py
Checking the output directory:
hdfs dfs -ls /matrixoutput
View the output of the map reduce job:
hdfs dfs -cat /matrixoutput/part-00000
Algorithm:
Map: for each element in A:
generate key (row,k),col,value
k:columns of n
for each element in B:
generate key(k,col),row,value
k:rows of n
Reduce:
Sort the values of A and B.
Multiply Aij * Bjk
Sum of Aij * Bjk with change of j