Big_Data_Programming_ICP_1 - kusamdinesh/Big-Data-and-Hadoop GitHub Wiki


A software platform running in the cloud or on-premises for data engineering, data warehousing, machine learning and analytics. Cloudera began as an open-source hybrid Apache Hadoop distribution, CDH(Cloudera Distribution Including Apache Hadoop), which targeted the company-class deployments of that technology.

Cloudera Setup file

Cloudera Environment in VirtualBox


Hue Visualization

Shakespeare Text File

Word List Text File

First 5 lines of the merged file

Last 5 lines of the merged file


Merged file of first 5 lines and last 5 lines of (merged file of Shakespeare and word list files)