ICP 1 - Murarishetti-Shiva-Kumar/Big-Data-Programming GitHub Wiki

Lesson 1: Cloudera (configuration/Hue)

Use the given datasets(shakespeare.txt,word_list.txt)

image

Load it in hadoop hdfs

Creating directory:

image image Copying files from local to Hadoop hdfs:

image image image

Use the second file and append it to the first file

image

Visualize file with Hue

First lines of the file:

image

Last lines of the file:

image

View the first and last lines (approximately 5) of merged dataset using appropriate hdfs commands

First lines of the file:

image

Last lines of the file:

image

Renaming the appended file:

image image

Copying the original first file to Hadoop hdfs:

image image

Create a new text file and load it into hdfs and try to append all three datasets

Merging the 3files into a single file:

image

getmerge - Takes a source directory and a destination file as input and concatenates files in src into the destination local file.

Destination local file:

image

As the concatenated destination file got generated in the local file system. We have to move the file from local to Hadoop hdfs

image

Final concatenated file in Hue image