ICP Assignment 3 - MadhuriSarode/BDP GitHub Wiki

Student ID : 26 Madhuri Sarode

Student ID : 12 Chennupati,Bhargavi Saipoojitha

Student ID : 20 Bhavana Deepti

The advanced Map reduce algorithms such as matrix multiplication, breath first search and depth first search are implemented using the IDE IntelliJ and maven build tool.

Program 1 : Matrix Multiplication

Two input matrices of the order M(2,3) and N (3,2) are considered for multiplication.The matrices values are written onto the input file as follows. Prefixed with the matrix name, values at each position in the matrix is specified by using row and column numbers,
and each value is separated by ',' .

Example

Matrix M is

1 2 3                              
4 5 6              

Matrix N is

7  8
9  10   
11  12

Multiplied Result M*N is

      58  64
     139 154

Algorithm:

Algorithm Explanation File:

The map function

For each element of M, do

Produce (key,value) pairs as ( (i,k),(M,j,mij)) , for k = 1,2 … up to number of columns of N For each element of n, do

Produce (key,value) pairs as ( (i,k),(N,j,njk)) , for I = 1,2 … up to number of rows of M

Return set of (key,value) pairs that each key ,(i,k) , has a list with values (M,j,mij) and (N,j,njk) for all possible values of j

The Reduce function

For each key(i,k) do

Sort values begin with M by j in listM

Sort values begin with N by j in listN

Multiply mij and nik for β€˜j’th value of each list

Sum up mij * njk

Return (i,k) , Summation of mij * njk

Java Source code execution in Hadoop cluster using HDFS file system

The above operation is carried out using the java code as follows uploaded here -->BDP/ICP3/code/MatrixMultiplier.java

The inputfile with the matix values is created InputMatrices.txt

It is pushed onto HDFS file system using the command

> hdfs dfs -put MatrixInput.txt /user/cloudera/ICP3

The File details are as follows. As seen, the two matrices input is given in the same file, the matrix name at the start of each row indicates which matrix values are being input. The values of the matrices is specified using the coordinates such as (row,column). The Pattern of the input file is therefore

'Matrix Name' , row, column, value at that specific co-ordinate

The project is built using maven build tool

The built project would have generated a target directory holding the jar file. This jar file is executed to run the program on cluster.

Once the program finishes executing, the Hadoop Job can be viewed using the URL obtained during jar execution

The output file is generated which holds the values of the resultant matrix

The depth first search and breadth first search algorithm has been implemented using map reduce. The following java file has the code for the same.TreeSearchAlgorithm.java The Tree traversal is recorded for both cases and output is printed.

The following is the input format of the graph. The matrix of vetrices connections from each node to the others is as follows

   Node β€”> 0	1	2	3	4
   |
   0	0	1	0	0	1

   1	1	0	1	1	1

   2	0	1	0	1	0

   3	0	1	1	0	1

   4	1	1	0	1	0