ICP 4 - navyagonug/CS5590-BIG-DATA-PROGRAMMING-USING-HADOOP-AND-SPARK GitHub Wiki
PROBLEM STATEMENT
1.Create Hive Tables and Perform Queries for Use Case based on Petrol Data. 2.Create Hive Tables and Perform Queries for Use Case based on Olympics Data. 3. Split the Petrol or Olympics Data and perform a meaningful Join Operation on the Data. 4.Use where clause as condition in both Data
DATASETS
Two datasets are given. One is petrol.txt which is a text file and the other is olympic_data.csv which is comma seperated value file.
FEATURES
Hive commands such as creation of table, splitting the values etc is done.
APPROACH
The following are the screenshots depicts a set of commands performed on petrol.txt dataset.
The following screenshots now depicts the basic commands performed on olympic_data.csv file.
BONUS In the bonus question, petrol.txt file is split into two tables(split2 and split3 table) and a left join operation is performed with a where clause.