ICP 1 - awais546/Big-Data-Programming-Hadoop-Pyspark GitHub Wiki
Big Data Programming ICP-1
This ICP included multiple parts including following:
- Installation of Softwares
- Using Heu to visualize data merging Two Data Files
- Displaying the Merged Files Displaying top 5 and Bottom 5 lines
- Writing these 10 lines to a new file and displaying that file
1.Installation: Cloudera
The Screen Shots below show all the installations.
2.Visualization of data in Hue
The Visualization inside the Hue can be seen in the following screenshot.
4.Merging the two files
Using HDFS Both files of Shakespeare and Word_List have been merged using HDFS Shell command as shown in screen shot
below.
5.Top5 and Bottom5 lines visualization
This is an interesting part. Now we have to visualize the top 5 lines and the bottom 5 lines of the combined file and
save them to a new file. The process with results is shown below in the screen shot.