ICP 01 : Cloudera and Hue - acikgozmehmet/BigDataProgramming GitHub Wiki

ICP-01: Installation of Cloudera and visualize data with Hue

VirtualBox and Cloudera Installation
Use the given dataset
Load it in hadoop hdfs
Use the second file
Append it to the first file
Visualize file with Hue
View the first and last lines(approximately 5) of merged dataset using appropriate hdfs commands
Create a new text file and load it into hdfs and try to append all three datasets.

Use the following link to download and install oracle virtual box

Use the following link to downlaod and install the cloudera virtualbox (ie: cloudera-quickstart-vm-5.13.0-0-virtualbox)

Extract cloudera file to a directory

In VirtualBox, Go to file and select import Appliance to select the VM image from your hard drive.

Select the virtual machine to start loading the VM image in the virtual box. (cloudera-quickstart-vm-5.13.0-0-virtualbox)

It will take several minutes to start the Cloud era operating system.

Cloudera Installation

Creating a directory (icp1) in hdfs

hadoop fs -mkdir /user/cloudera/icp1
Putting a file (shakespeare.txt) from local system to hdfs

hadoop fs -put shakespeare.txt /user/cloudera/icp1

Listing the files in hdfs

hadoop fs -ls /user/cloudera/icp1
Appending a file in local file system to another file in hdfs

hadoop fs -appendToFile word_list.txt /user/cloudera/icp1/shakespeare.txt
Viewing the first and last lines(approximately 5) of merged dataset using appropriate hdfs commands

hadoop fs -cat /user/cloudera/icp1/shakespeare.txt | head -n 5

hadoop fs -cat /user/cloudera/icp1/shakespeare.txt | tail -n 5
Viewing the first and last lines(approximately 5) of merged dataset using appropriate hdfs commands and saving then to different files

hadoop fs -cat /user/cloudera/icp1/shakespeare.txt | head -n 5 | hadoop fs -put - /user/cloudera/icp1/head5lines.txt

hadoop fs -cat /user/cloudera/icp1/shakespeare.txt | tail -n 5 | hadoop fs -put - /user/cloudera/icp1/tail5lines.txt

Creating a new text file and loading it into hdfs and then appending all three datasets to the file

hadoop fs -text /user/cloudera/icp1/*.txt | hadoop fs -put - /user/cloudera/icp1/all_in_one.txt