ICP 2 - awais546/Big-Data-Programming-Hadoop-Pyspark GitHub Wiki

Task

Following are the mentioned task which had to be done for the ICP-2.

  • Getting the environment ready in Eclipse/Intellij.
  • Download the source code and set it up in Eclipse/Intellij.
  • Make the changes in the source code to produce a text file containing only words starting from a and their count.

Creating the Project

The JavaProject was created in eclipse. All the necessary jarfiles were imported in order to run the code. Below are the screenshots showing the project and jarfiles.

Changes in the Function

A condition was applied in order to filter out only the words starting from "a". Below is the screenshot showing the changes made in the code.

Output File

The output file was created in a folder in Hue containing all the words starting from "a" with their respective counts.