ICP1 - neerajpadarthi/Big-Data-Programming GitHub Wiki
Name : Neeraj Padarthi
Class ID: 22
ICP : 1
Topic: Installing Cloudera and visualize Hadoop data with Hue
Task
- Cloudera installation
- Use the given dataset
- Load it in Hadoop hdfs
- Use the second file
- Append it to the first file
- Visualize file with Hue
- View the first and last lines(approximately 5) of a merged dataset using appropriate hdfs commands
- Create a new text file and load it into hdfs and try to append all three datasets.
Features
Question Loading and Appending Files
- hadoop fs -mkdir /user/cloudera/bdp/icp1
- hadoop fs -ls /user/cloudera/bdp/icp1
- hadoop fs -put word_list.txt /user/cloudera/bdp/icp1
- hadoop fs -ls ltr /user/cloudera/bdp/icp1
- hadoop fs -appendToFile shakespeare.txt /user/cloudera/bdp/icp1/word_list.txt
- Creating a directory in Hadoop
- Listing the files in a directory, no files present.
- Putting the file from local system to the Hadoop ICP1 folder.
- Listing the files in a directory, one file present.
- Appending the shakespeare.txt file to the existing word_list.txt which is present in the Hadoop.


Question Visualizing file with Hue
- Opening the Hue, going to the file system and its respective path.

Question Viewing first and last lines of the merged data set
- hadoop fs -cat /user/cloudera/bdp/icp1/word_list.txt | head -n 5
- hadoop fs -cat /user/cloudera/bdp/icp1/word_list.txt | head -n 5 |hadoop fs -put - /user/cloudera/bdp/icp1/head.txt
- hadoop fs -cat /user/cloudera/bdp/icp1/head.txt
- hadoop fs -cat /user/cloudera/bdp/icp1/word_list.txt | tail -n 5 |hadoop fs -put - /user/cloudera/bdp/icp1/tail.txt
- hadoop fs -cat /user/cloudera/bdp/icp1/tail.txt
- Using Hadoop cat operator then concatenating and viewing top 5 rows using the head function.
- Using Hadoop cat operator then concatenating and viewing top 5 rows using the head function then concatenating to put into a new file called head.txt
- Seeing the content of the head.txt using -cat Hadoop command.
- Using Hadoop cat operator then concatenating and viewing last 5 rows using the tail function then concatenating to put into a new file called tail.txt
- Seeing the content of the tail.txt using -cat Hadoop command.
- Listing the files in a directory, 3 files present.


Question Creating a new file by appending all the data sets.
- hadoop fs -text /user/cloudera/bdp/icp1/*.txt | hadoop fs -put - /user/cloudera/bdp/icp1/consolidated.txt
- hadoop fs -ls /user/cloudera/bdp/icp1
- Using the text command to displaying the content of all the 3 datasets and then concatenating the content to a new file in Hadoop called consolidated.txt using the put command.
- Listing the files in a directory, 4 files present.
- Opening the Hue, going to the file system and opening the consolidated.txt.


Video
https://youtu.be/xu-1LK9eGlw