ICP 1 - Gnkhakimova/CS5590-BigData GitHub Wiki
ICP 1
Installation / Cloudera / Hue
Tasks
- Install Cloudera and Virtual Box
- Load two .txt files to HDFS and append them to third file.
- Display appended file on terminal
- Visualize appended file using Hue
- Append three existing files into fourth file in HDFS
Configuration
- Cloudera 5.13.0
- Oracle VM 6.0
- Hue account
Features
In this ICP 1 we used Cloudera terminal to complete task, which required to create, load, append files in HDFS. For visualization part we used Hue to display appended file.
1. Installation
Install Cloudera and Oracle VM and run Cloudera on VM.
2. Download dataset
Created localfile directory on desktop and downloaded two datasets from given links.
3. HDFS
Copied files to HDFS from local machine. Appended two files into third, newly created file called "out.txt". Displayed head and tail of the file on Cloudera terminal. Appended existing 3 files into new file called "final.txt"
- Created directory called ICP1 in HDFS
- Copied two text files to HDFS directory from local machine
- Appended two copied files into new third file called out.txt
- Displaying head of the third file in terminal
- Displaying tail of the third file in terminal
- Appended three files into newly created fourth file called final.txt
4. Hue
Visualized appended file using Hue by going to HDFS folder and selecting a path to the file.
Limitations
- Download time is very long
- Had issues with opening Cloudera from VM, which required to empty some space on my local machine.