ICP 1 - awais546/Big-Data-Programming-Hadoop-Pyspark GitHub Wiki

                                      Big Data Programming ICP-1

This ICP included multiple parts including following:

  • Installation of Softwares
  • Using Heu to visualize data merging Two Data Files
  • Displaying the Merged Files Displaying top 5 and Bottom 5 lines
  • Writing these 10 lines to a new file and displaying that file

1.Installation: Cloudera

The Screen Shots below show all the installations. 2.Visualization of data in Hue

The Visualization inside the Hue can be seen in the following screenshot. 4.Merging the two files

Using HDFS Both files of Shakespeare and Word_List have been merged using HDFS Shell command as shown in screen shot below.

5.Top5 and Bottom5 lines visualization

This is an interesting part. Now we have to visualize the top 5 lines and the bottom 5 lines of the combined file and save them to a new file. The process with results is shown below in the screen shot.