BDP _ ICP1 - aaackc/Big-data-programing GitHub Wiki

Title

Installation/ Cloudera/ HUE

Description

Objectives:

  1. Virtual box and Cloudera installation
  2. Use the given dataset
  3. Load it in Hadoop hdfs
  4. Use the second file
  5. Append it to the first file
  6. Visualize file with Hue
  7. View the first and last lines(approximately 5) of a merged dataset using appropriate hdfs commands.
  8. Create a new text file and load it into hdfs and try to append all three datasets.

Cloudera Installation

Cloudera is:

  1. A software platform for data engineering, data ware housing, machine learning and analytics that runs in the cloud or on premises.
  2. Cloudera started as a hybrid open-source Apache Hadoop distribution, CDH(Cloudera Distribution Including Apache Hadoop), that targeted enterprise-class deployments of that technology.

Screenshots of Cloudera downalod directory and virtual machine import:

Commands comments:

[cloudera@quickstart ~]$ hadoop fs -mkdir /user/cloudera/bdp/icp1 Creating a directory in Hadoop

[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/bdp/icp1 Listing the files in a directory, no files present.

[cloudera@quickstart ~]$ hadoop fs -put word_list.txt /user/cloudera/bdp/icp1 Putting the file from local system to the Hadoop ICP1 folder.

[cloudera@quickstart ~]$ hadoop fs -ls /user/cloudera/bdp/icp1 Listing the files in a directory, one file present.

[cloudera@quickstart ~]$ hadoop fs -appendToFile shakespear.txt /user/cloudera/bdp/icp1/word_list.txt Appending the shakespeare.txt file to the existing word_list.txt which is present in the Hadoop.

[cloudera@quickstart ~]$ hadoop fs -appendToFile shakespear.txt word_list.txt /user/cloudera/bdp/icp1/new_text.txt Appending the file new_text.txt to the existing word_list.txt and shakespeare.txt which is present in the Hadoop.

Viewing output file HUE

Hue(Hadoop User Interface):

  1. Hue is ano pen-source Web interface that supports Apache Hadoop and its ecosystem, licensed under the Apache v2 license.
  2. Hue is an open source Analytics Workbench for browsing, querying and visualizing data.

Learnings from the Lesson:

We have learnt how to install and setup our environment for Cloudera to run Hadoop. We also learned how we can send our files from local file system to Hadoop File System and visualize using browser with Hadoop User Experience (HUE).

Limitations:

There were no such limitations in this ICP as the instructions were easy to follow along.

References:

https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CommandsManual.html

Demo Video:

BDP _ ICP1