ICP 3 - Murarishetti-Shiva-Kumar/Big-Data-Programming GitHub Wiki

Lesson Plan3: Hadoop MapReduce and Hadoop Distributed File System (HDFS)

1. Matrix Multiplication in Map Reduce

1.Create Java project in Eclipse and import the required external Jars from usr/hadoop/lib and from usr/hadoop/client.

2.Then create the necessary class files(Map, Reduce, MatrixMultiplication)

3.Export the project as jar file.

4.For Matrix multiplication we require two matrices as inputs

5.Then execute the jar file using the following command.

Implementation: Matrix Multiplication using Vector addition

1.The order of the matrices are set in the main method as mxn and nxp inside the main method to the configuration.

2.Inside the mapper function the rows of the first matrix and the columns of the second matrix are retrieved, values are passed in a string separated by ",".

3.String starts with M it is an indication that it is from the Matrix M, String starts with N indicates Matrix N and from there the values of the name of the matrix,row,column,value (M,i,j,Mij) for Matrix M and (N,J,K,Njk) for Matrix N are calculated in a row.

4.After getting the values mentioned in the mapper the reducer takes the values and with the help of the starting value(M or N) it will create two hash maps for both the matrices.

5.Then it performs the multiplication operation(Multiplication of row of the first matrix with the column of the second) and stores the values resulting to the positions in the respective i,k values(If a matrix is of the order mxn and another of nxp then the resultant will be of the order mxp).

Bonus Question

Breadth First Search in Graph using Map Reduce

Implementation:

1.In mapper function input from the file is taken then separated by space or tab converting it to string and each character is passed into a string array.

2.It is splitted with respect to the column to make sure to visit every node is completed and the mapping will be done in the pairs of values and nodes.

3.In reducer phase it identifies the key value node and identifies the least distance value for each node, after that node gets completed the distances mapping of that one will be the next path of the next input file.