ICP 01 : Cloudera and Hue - acikgozmehmet/BigDataProgramming GitHub Wiki

ICP-01: Installation of Cloudera and visualize data with Hue

Objectives

  • VirtualBox and Cloudera Installation
  • Use the given dataset
  • Load it in hadoop hdfs
  • Use the second file
  • Append it to the first file
  • Visualize file with Hue
  • View the first and last lines(approximately 5) of merged dataset using appropriate hdfs commands
  • Create a new text file and load it into hdfs and try to append all three datasets.

VirtualBox and Cloudera Installation

Use the following link to download and install oracle virtual box

https://www.virtualbox.org/wiki/Downloads

Use the following link to downlaod and install the cloudera virtualbox (ie: cloudera-quickstart-vm-5.13.0-0-virtualbox)

https://www.cloudera.com/downloads/quickstart_vms/5-13.html

Extract cloudera file to a directory

In VirtualBox, Go to file and select import Appliance to select the VM image from your hard drive.

Select the virtual machine to start loading the VM image in the virtual box. (cloudera-quickstart-vm-5.13.0-0-virtualbox)

It will take several minutes to start the Cloud era operating system.

Cloudera Installation

File Manipulations

  • Creating a directory (icp1) in hdfs

    hadoop fs -mkdir /user/cloudera/icp1

  • Putting a file (shakespeare.txt) from local system to hdfs

    hadoop fs -put shakespeare.txt /user/cloudera/icp1

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-01/Documentation/VirtualBox_cloudera-quickstart-vm-5.13.0-0-virtualbox_27_01_2020_19_02_09.png

  • Listing the files in hdfs

    hadoop fs -ls /user/cloudera/icp1

  • Appending a file in local file system to another file in hdfs

    hadoop fs -appendToFile word_list.txt /user/cloudera/icp1/shakespeare.txt

  • Viewing the first and last lines(approximately 5) of merged dataset using appropriate hdfs commands

    hadoop fs -cat /user/cloudera/icp1/shakespeare.txt | head -n 5

    hadoop fs -cat /user/cloudera/icp1/shakespeare.txt | tail -n 5

  • Viewing the first and last lines(approximately 5) of merged dataset using appropriate hdfs commands and saving then to different files

    hadoop fs -cat /user/cloudera/icp1/shakespeare.txt | head -n 5 | hadoop fs -put - /user/cloudera/icp1/head5lines.txt

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-01/Documentation/VirtualBox_cloudera-quickstart-vm-5.13.0-0-virtualbox_27_01_2020_20_05_31.png

hadoop fs -cat /user/cloudera/icp1/shakespeare.txt | tail -n 5 | hadoop fs -put - /user/cloudera/icp1/tail5lines.txt

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-01/Documentation/VirtualBox_cloudera-quickstart-vm-5.13.0-0-virtualbox_27_01_2020_20_05_08.png

  • Creating a new text file and loading it into hdfs and then appending all three datasets to the file

    hadoop fs -text /user/cloudera/icp1/*.txt | hadoop fs -put - /user/cloudera/icp1/all_in_one.txt

https://github.com/acikgozmehmet/BigDataProgramming/blob/master/ICP-01/Documentation/VirtualBox_cloudera-quickstart-vm-5.13.0-0-virtualbox_27_01_2020_20_02_26.png