Module 1: ICP #5 - SnehaMishra28/BigData_Programming_Summer2018 GitHub Wiki

Team: 12
Professor: Yugyung Lee

Name: Sneha Mishra
Class ID: 11
Email: [email protected]
MyGitHub

Technical Partner:
Name: Aditya Soman
Class ID: 19
Email: [email protected]
GitHub

Objective

Introduction to Hive.

Features

  1. Install Hive.
  2. Hive is a data warehousing system to store structured data on Hadoop file system and provides an easy query these data by execution Hadoop MapReduce plans.
  3. Basics of Hive QL.

Steps:

Step 1: Configure mysql (first time user only)

Step 2: Download and install Hive

Step 3: Run Hive

Step 3: Create Tables

Petrol Table

Load Data in Petrol Table

Olympic Table

In Class Exercise:

Question 1: Create Hive Tables and Perform Queries for Use Case based on Petrol Data. See the Slides for details.

Query 1: SELECT distributer_name,SUM(vol_OUT) FROM petrol GROUP BY distributer_name;

Query 2: SELECT distributer_id, vol_OUT FROM petrol order by vol_OUT desc limit 10;

Query 3: SELECT distributer_id, vol_OUT FROM petrol order by vol_OUT limit 10;

Query 4: List all distributors who have this difference, along with the year and the difference which they have in that year. Hint: (vol_IN-vol_OUT)>500

Question 2: Create Hive Tables and Perform Queries for Use Case based on OlympicsData. See the Slides for details.

Query 1: select country,SUM(total) from olympic where sport = “Swimming” GROUP BY country;

Query 2: select year,SUM(total) from olympic where country = “India” GROUP BY year

Query 3: select country,SUM(total) from olympic GROUP BY country;

Query 4: select country,SUM(gold) from olympic GROUP BY country;

Query 5: Which country got medals for Shooting, year wise classification

References:

https://gist.github.com/giwa/ed13ac177c1e1a97fba0 https://askubuntu.com/questions/280768/how-to-rename-a-file-in-terminal http://hadooptutorial.info/partitioning-in-hive/ https://stackoverflow.com/questions/34678597/add-partitions-on-existing-hive-table