Module 1: ICP #1 - VidyullathaKaza/BigData_Programming_Spring2020 GitHub Wiki

ICP-1

Cloudera

Description:

As part of the exercise we installed the following tools for the Big data programming:

  • Cloudera setup
  • IntelliJ Community edition

Downloaded in Cloudera

Downloaded in LocalRepository

Exercise

The following steps were performed to complete the exercise

  1. Downloaded the dataset provided.

  2. Loaded one of the datasets into HDFS using the below commands

  3. The other dataset is appended to the first file using below command

  4. Finally using Hue we displayed the data.

Learning Outcomes

  • Understood the importance of cloudera.

  • Hue is one of the new technologies that we heard of in this exercise.

  • Executed HDFS commands