ICP1 - neerajpadarthi/Big-Data-Programming GitHub Wiki

Name : Neeraj Padarthi

Class ID: 22

ICP : 1

Topic: Installing Cloudera and visualize Hadoop data with Hue

Task

  • Cloudera installation
  • Use the given dataset
  • Load it in Hadoop hdfs
  • Use the second file
  • Append it to the first file
  • Visualize file with Hue
  • View the first and last lines(approximately 5) of a merged dataset using appropriate hdfs commands
  • Create a new text file and load it into hdfs and try to append all three datasets.

Features

  • Cloudera
  • Hadoop
  • Hue

Question Loading and Appending Files

  1. hadoop fs -mkdir /user/cloudera/bdp/icp1
  2. hadoop fs -ls /user/cloudera/bdp/icp1
  3. hadoop fs -put word_list.txt /user/cloudera/bdp/icp1
  4. hadoop fs -ls ltr /user/cloudera/bdp/icp1
  5. hadoop fs -appendToFile shakespeare.txt /user/cloudera/bdp/icp1/word_list.txt
  • Creating a directory in Hadoop
  • Listing the files in a directory, no files present.
  • Putting the file from local system to the Hadoop ICP1 folder.
  • Listing the files in a directory, one file present.
  • Appending the shakespeare.txt file to the existing word_list.txt which is present in the Hadoop.

Question Visualizing file with Hue

  • Opening the Hue, going to the file system and its respective path.

Question Viewing first and last lines of the merged data set

  1. hadoop fs -cat /user/cloudera/bdp/icp1/word_list.txt | head -n 5
  2. hadoop fs -cat /user/cloudera/bdp/icp1/word_list.txt | head -n 5 |hadoop fs -put - /user/cloudera/bdp/icp1/head.txt
  3. hadoop fs -cat /user/cloudera/bdp/icp1/head.txt
  4. hadoop fs -cat /user/cloudera/bdp/icp1/word_list.txt | tail -n 5 |hadoop fs -put - /user/cloudera/bdp/icp1/tail.txt
  5. hadoop fs -cat /user/cloudera/bdp/icp1/tail.txt
  • Using Hadoop cat operator then concatenating and viewing top 5 rows using the head function.
  • Using Hadoop cat operator then concatenating and viewing top 5 rows using the head function then concatenating to put into a new file called head.txt
  • Seeing the content of the head.txt using -cat Hadoop command.
  • Using Hadoop cat operator then concatenating and viewing last 5 rows using the tail function then concatenating to put into a new file called tail.txt
  • Seeing the content of the tail.txt using -cat Hadoop command.
  • Listing the files in a directory, 3 files present.

Question Creating a new file by appending all the data sets.

  1. hadoop fs -text /user/cloudera/bdp/icp1/*.txt | hadoop fs -put - /user/cloudera/bdp/icp1/consolidated.txt
  2. hadoop fs -ls /user/cloudera/bdp/icp1
  • Using the text command to displaying the content of all the 3 datasets and then concatenating the content to a new file in Hadoop called consolidated.txt using the put command.
  • Listing the files in a directory, 4 files present.
  • Opening the Hue, going to the file system and opening the consolidated.txt.

Video

https://youtu.be/xu-1LK9eGlw