ICP Assignment 3 - MadhuriSarode/BDP GitHub Wiki
Student ID : 26 Madhuri Sarode
Student ID : 12 Chennupati,Bhargavi Saipoojitha
Student ID : 20 Bhavana Deepti
The advanced Map reduce algorithms such as matrix multiplication, breath first search and depth first search are implemented using the IDE IntelliJ and maven build tool.
Program 1 : Matrix Multiplication
Two input matrices of the order M(2,3) and N (3,2) are considered for multiplication.The matrices values are written onto the input file as follows.
Prefixed with the matrix name, values at each position in the matrix is specified by using row and column numbers,
and each value is separated by ',' .
Example
Matrix M is
1 2 3
4 5 6
Matrix N is
7 8
9 10
11 12
Multiplied Result M*N is
58 64
139 154
Algorithm:
The map function
For each element of M, do
Produce (key,value) pairs as ( (i,k),(M,j,mij)) , for k = 1,2 β¦ up to number of columns of N For each element of n, do
Produce (key,value) pairs as ( (i,k),(N,j,njk)) , for I = 1,2 β¦ up to number of rows of M
Return set of (key,value) pairs that each key ,(i,k) , has a list with values (M,j,mij) and (N,j,njk) for all possible values of j
The Reduce function
For each key(i,k) do
Sort values begin with M by j in listM
Sort values begin with N by j in listN
Multiply mij and nik for βjβth value of each list
Sum up mij * njk
Return (i,k) , Summation of mij * njk
Java Source code execution in Hadoop cluster using HDFS file system
The above operation is carried out using the java code as follows uploaded here -->BDP/ICP3/code/MatrixMultiplier.java
The inputfile with the matix values is created InputMatrices.txt
It is pushed onto HDFS file system using the command
> hdfs dfs -put MatrixInput.txt /user/cloudera/ICP3
The File details are as follows. As seen, the two matrices input is given in the same file, the matrix name at the start of each row indicates which matrix values are being input. The values of the matrices is specified using the coordinates such as (row,column). The Pattern of the input file is therefore
'Matrix Name' , row, column, value at that specific co-ordinate
The project is built using maven build tool
The built project would have generated a target directory holding the jar file. This jar file is executed to run the program on cluster.
Once the program finishes executing, the Hadoop Job can be viewed using the URL obtained during jar execution
The output file is generated which holds the values of the resultant matrix
The depth first search and breadth first search algorithm has been implemented using map reduce.
The following java file has the code for the same.TreeSearchAlgorithm.java
The Tree traversal is recorded for both cases and output is printed.
The following is the input format of the graph. The matrix of vetrices connections from each node to the others is as follows
Node β> 0 1 2 3 4
|
0 0 1 0 0 1
1 1 0 1 1 1
2 0 1 0 1 0
3 0 1 1 0 1
4 1 1 0 1 0