ICP 1 - PavankumarManchala/BigDataProgrammingICPs GitHub Wiki

Submitted by:

Pavankumar Manchala Class Id: 17

Installations:

Oracle VM Virtual Box, Cloudera, Intellij

Datasets:

  1. Shakespeare.txt https://umkc.box.com/s/208ehts7vn8ls5yhsea0x0ht6rgkrnnp

  2. Word_list.txt https://umkc.box.com/s/bcurc4qjbpx5hpb7pni8950os78enf0e

Task:

Use the Datasets and load it in hadoop hdfs, append the second file to first file and display it.

Here is the Shakespeare.txt in Hue visualization after appending the data of word_list.txt

This is the word_list.txt dataset end part in Hue.

ICP vidoe explanation: https://drive.google.com/open?id=19zCcfC0hrZxVMLsL7a3o5hvfs-NiQ20w

All ICPs videos link: https://drive.google.com/open?id=1racqWkfI10T-CpLYEDYCvJRSRhhLGsWL