Module 1: ICP #5 - SnehaMishra28/BigData_Programming_Summer2018 GitHub Wiki
Team: 12
Professor: Yugyung Lee
Name: Sneha Mishra
Class ID: 11
Email: [email protected]
MyGitHub
Technical Partner:
Name: Aditya Soman
Class ID: 19
Email: [email protected]
GitHub
Objective
Introduction to Hive.
Features
- Install Hive.
- Hive is a data warehousing system to store structured data on Hadoop file system and provides an easy query these data by execution Hadoop MapReduce plans.
- Basics of Hive QL.
Steps:
Step 1: Configure mysql (first time user only)


Step 2: Download and install Hive





Step 3: Run Hive

Step 3: Create Tables
Petrol Table

Load Data in Petrol Table

Olympic Table

In Class Exercise:
Question 1: Create Hive Tables and Perform Queries for Use Case based on Petrol Data. See the Slides for details.
Query 1: SELECT distributer_name,SUM(vol_OUT) FROM petrol GROUP BY distributer_name;

Query 2: SELECT distributer_id, vol_OUT FROM petrol order by vol_OUT desc limit 10;

Query 3: SELECT distributer_id, vol_OUT FROM petrol order by vol_OUT limit 10;

Query 4: List all distributors who have this difference, along with the year and the difference which they have in that year. Hint: (vol_IN-vol_OUT)>500

Question 2: Create Hive Tables and Perform Queries for Use Case based on OlympicsData. See the Slides for details.
Query 1: select country,SUM(total) from olympic where sport = “Swimming” GROUP BY country;

Query 2: select year,SUM(total) from olympic where country = “India” GROUP BY year

Query 3: select country,SUM(total) from olympic GROUP BY country;

Query 4: select country,SUM(gold) from olympic GROUP BY country;

Query 5: Which country got medals for Shooting, year wise classification

References:
https://gist.github.com/giwa/ed13ac177c1e1a97fba0 https://askubuntu.com/questions/280768/how-to-rename-a-file-in-terminal http://hadooptutorial.info/partitioning-in-hive/ https://stackoverflow.com/questions/34678597/add-partitions-on-existing-hive-table