ICP 2 - navyagonug/CS5590-BIG-DATA-PROGRAMMING-USING-HADOOP-AND-SPARK GitHub Wiki
ICP 2
PROBLEM STATEMENT 1.Counting the frequency of words in the given input with MapReduce algorithm. 2.Counting the frequency of odd number and even numbers in the given input with MapReduce algorithm. (BONUS POINT) 3.Using use case 1 Count the frequency of characters in the given input with MapReduce algorithm.
FEATURES Technologies used are Intellij IDE(Maven), Cloudera, Virtual Box, Java. This in-class programming includes performing word count on a file, identifying the even and odd numbers along with their count. The bonus question involves counting each and every character in the given input file.
CONFIGURATIONS
The **pom.xml ** file is modified in Intellij IDE. The xml file is as follows.
4.0.0
<groupId>gid</groupId>
<artifactId>aid</artifactId>
<version>1.0-SNAPSHOT</version>
<repositories>
<repository>
<id>apache</id>
<url>http://maven.apache.org</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.2.0</version>
</dependency>
</dependencies>
Question: Counting the frequency of words in the given input with MapReduce algorithm.
Approach
There are two classes, Mapper and Reduce Class. In Mapper class, the text from the input text file is tokenized into words to form a key value pair with all the words present in the input text file. The key is the word from the input file and value is ‘1’. In Reduce Phase, all the keys are grouped together and the values for similar keys are added up to find the occurrences for a particular word. It is like an aggregation phase for the keys generated by the map phase
**Input: **The input given is as follows in the given screenshot
Output
Question:Counting the frequency of odd number and even numbers in the given input with MapReduce algorithm.
Approach
In this program, An even number is found out by dividing it with '2'. If the remainder is 0, It is categorized as an even number . Remaining are odd numbers. In Mapper phase, Numbers are divided into 'even' and 'odd' category. In Reduce phase, aggregation of these numbers are done and the total count of even and odd numbers are displayed.
Input
Output
Question:Using use case 1 Count the frequency of characters in the given input with MapReduce algorithm. Approach
The code is similar to first code. However, tokens are split character wise.
Input
Output
REFERENCES