ICP 1 - bhargavi1411/BigDataProgramming GitHub Wiki
Name: Bhargavi Saipoojitha Chennupati
Class ID: 12
Topic: Cloudera(Configuration/ Hue)
Task : Cloudera installations and Visualization of Hadoop File with Hue
Datasets: Shakespeare.txt, word_line.txt
To perform this task we have used Cloudera.
To perform this Icp task we have used the given datasets shakespeare.txt and word_list.txt Next we have created a directory named hadoopfiles using the following command
hdfs dfs -mkdir /hadoopfiles/
To load the files shakespeare.txt and word_list.txt we have used the following command
hdfs dfs copyFromLocal Desktop/ICP1/shakespeare.txt /hadoopfiles/
hdfs dfs copyFromLocal Desktop/ICP1/word_list.txt /hadoopfiles/
Now we have appended the two files and copied the output into a new file using the command
hdfs dfs -cat /hadoopfiles/shakespeare.txt /hadoopfiles/word_list.txt|hdfs dfs -put - /hadoopfiles/output.txt
Now to read the first 10 lines from the output file we got from appending the two files,we use the following command
hdfs dfs -cat /hadoopfiles/output.txt| head -10
similarly to read the last 10 lines we use the following command
hdfs dfs -cat /hadoopfiles/output.txt| tail -10
Now we have create the third file and append it to the first two files For that I have created a file sample.txt and placed a text inside it. We have loaded sample.txt into hadoop files using the following command
hdfs dfs -copyFromLocal Desktop/ICP1/sample.txt /hadoopfiles
Now we have to append this sample1.txt with other two files and save it into a new file using the following command
hdfs dfs -cat /hadoopfiles/shakespeare.txt /hadoopfiles/word_list.txt /hadoopfiles/sample1.txt| hdfs dfs -put - /hadoopfiles/output2.txt
We can display the first 5 and last 5 lines from output2.txt to check the three files have successfully appended using the commands:
hdfs dfs -cat /hadoopfiles/output2.txt| tail -5
hdfs dfs -cat /hadoopfiles/output2.txt| head -5
We can perform the Visualization using Hue
Here we can see all the files in our folder.