ICP1 - acsenvila/-CSEE5590-490-Big-Data-Programming- GitHub Wiki

This is the Wiki for ICP1 - Implementation of ICP-1 Big Data Programming. For this ICP, Cloudera in Oracle VM VirtualBox was used as the OS environment. Inside Cloudera was integrated other Big Data software, such as Hadoop, Hive, Hbase, Hue, and others.

Objective for In Class Exercise

Visualization of Hadoop file with Hue You are required to follow the steps below to complete your ICP today

  • Use the given dataset
  • Load it in hadoop hdfs
  • Use the second file
  • Append it to the first file
  • Visualize file with Hue
  • View the first and last lines(approximately 5) of merged dataset using appropriate hdfs commands
  • Create a new text file and load it into hdfs and try to append all three datasets.

Cloudera OS used for Big Data Programming

### > > ### File Manipulation and Input

2

3

4

### ### Output

5

References

  1. https://www.cloudera.com/
  2. https://www.virtualbox.org/
  3. https://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-common/FileSystemShell.html
  4. https://hortonworks.com/tutorial/manage-files-on-hdfs-via-cli-ambari-files-view/section/1