ICP 3 - sheriaravind/BDP-ICP-3 GitHub Wiki
Programming with Big Data ICP-3
Name - Aravind Sheri, Class ID- 18, Team - 6
Problem Statement: Implement matrix multiplication using Map Reduce in Hadoop.
Approach: Matrix multiplication needs the data all in one place to work with because each and every element is involved in the multiplication and addition. The data is given as text file where it contains the matrix, row, column and values in every line. Each line is read in the map phase and passes the key value pairs as the matrix index as the key and the row, column and value as a list as the value. In the mapper phase the output of the mapper is fed into the reducer to work with and the values are stored in two arrays based on the value of the matrix index. Value, row, column as a list are stored in a list and we need to work with the two arrays which contains the values. We will iterate through the list length and if the column index of Matrix A matches with the row index of the Matrix B then the values are multiplied and the for the key value row index of Matrix A and column index of B are produced with the multiplied value. With this it forms two sets of data with the same key values and we will add the values if the key values are identical and forms the multiplied matrix with the row and column as the key and the corresponding value. Below are screen shots for the code, command and the resulting matrix. Screenshots:
Mapper: Produces the key value pairs with Matrix as the key and row, column and value as the value in the key value pairs. Reducer: The key value pairs are read and performs the multiplication and the addition and produces the resulting multiplied matrix.
Input: Input is given as a text file with matrix, row, column and the value in each line.
Command: $ hadoop jar /usr/local/hadoop-2.9.0/share/hadoop/tools/lib/hadoop-streaming-2.9.0.jar -mapper mapper.py -reducer reducer.py -file /home/aravind/PycharmProjects/MapReduce/mapper.py -file /home/aravind/PycharmProjects/MapReduce/reducer.py -input /matrixinput -output /matrixmul10
Output: Result is produced as the below screenshot.