ICP 03 : MapReduce - acikgozmehmet/BigDataProgramming GitHub Wiki

ICP 03: Hadoop Distributed File System (HDFS)/ MapReduce and Big Data Applications

Objectives

  • Create a Map-Reduce Program to perform the task of matrix multiplication

1. Matrix Multiplication in Map Reduce

Suppose we have a i x j matrix M, whose element in row i and column j will be denoted mij and a j x k matrix N whose element in row j and column k is donated by njk then the product P = MN will be i x k matrix P whose element in row i and column k will be donated by pik , where p(i,k) = mij * njk https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-03/Documentation/Slide0.JPG

hadoop fs -mkdir /user/cloudera/icp3

hadoop fs -mkdir /user/cloudera/icp3/input

hadoop fs -put ./input/Matrix*.* /user/cloudera/icp3/input

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-03/Documentation/VirtualBox_cloudera-quickstart-vm-5.13.0-0-virtualbox_13_02_2020_19_36_38.png

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-03/Documentation/VirtualBox_cloudera-quickstart-vm-5.13.0-0-virtualbox_13_02_2020_19_36_53.png

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-03/Documentation/MultiplyMatrix%20%5BD__UMKC___Spring2020_CS5590BDP_Lesson3_MultiplyMatrix%5D%20-%20..._src_main_java_MatrixMultiply.java%20-%20IntelliJ%20IDEA%202_13_2020%207_46_17%20PM.png

Please find the code here: https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-03/SourceCode/MatrixMultiply.java

Map Function

Map function will create (key, value) pairs from the input data as it is described in the following algorithm.

Algorithm 1: The Map Function

  1. For each element (mij) of matrix_M, it will create (key, value) pairs as ((i,k), (M,j,mij)) for k =1, 2 , ... up to the number of columns of N.

  2. For each element (njk) of matrix_N, it will create (key, value) pairs as ((i,k), (N,j,njk)) for k =1, 2 , ... up to the number of columns of M.

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-03/Documentation/Slide2.JPG

  1. We will have a set of (key, value) pairs that each key (i,k) has a list with values (M,j, mij) and (N,j,njk)) for all possible values of j.

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-03/Documentation/Slide3.JPG

Reduce Function

Reduce function will use the output of the Map function and perform the calculations and produces key,value pairs as described in the following algorithm. Please note that all outputs are written to HDFS.

Algorithm 2: The Reduce Function

  1. For each key (i,k);

        it will sort values begin with M by j in the list List-M
    
        it will sort values begin with N by j in the list List-N
    
        it will multiply mij abd njk for the jth value of each list.
    
        it will sum up mij*njk
    
  2. It will return the result.

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-03/Documentation/Slide7.JPG

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-03/Documentation/Slide1.JPG

To run the program:

hadoop jar MatrixMultiply-1.0.jar MatrixMultiply /user/cloudera/icp3/input /user/cloudera/icp3/output

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-03/Documentation/VirtualBox_cloudera-quickstart-vm-5.13.0-0-virtualbox_13_02_2020_19_34_37.png

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-03/Documentation/VirtualBox_cloudera-quickstart-vm-5.13.0-0-virtualbox_13_02_2020_19_37_20.png

References :