ICP 4 - navyagonug/CS5590-BIG-DATA-PROGRAMMING-USING-HADOOP-AND-SPARK GitHub Wiki

PROBLEM STATEMENT

1.Create Hive Tables and Perform Queries for Use Case based on Petrol Data. 2.Create Hive Tables and Perform Queries for Use Case based on Olympics Data. 3. Split the Petrol or Olympics Data and perform a meaningful Join Operation on the Data. 4.Use where clause as condition in both Data

DATASETS

Two datasets are given. One is petrol.txt which is a text file and the other is olympic_data.csv which is comma seperated value file.

FEATURES

Hive commands such as creation of table, splitting the values etc is done.

APPROACH

The following are the screenshots depicts a set of commands performed on petrol.txt dataset.

The following screenshots now depicts the basic commands performed on olympic_data.csv file.

BONUS In the bonus question, petrol.txt file is split into two tables(split2 and split3 table) and a left join operation is performed with a where clause.