ICP 4 - PallaviArikatla/Big-Data-Programming GitHub Wiki
OBJECTIVE: To perform queries using Hive.
**### QUESTION 1: Perform petrol used case.
Create a table called petrol using given dataset and List all distributors who have this difference, along with the year and the difference they have in years.**
Initiate by entering into Hive in the terminal
Create a table with name petrol
Load petrol dataset
Created table is as follows:
a)In real life what is the total amount of petrol in volume sold by every distributor?
b)Which are the top 10 distributors ID’s for selling petrol and also display the amount of petrol sold in volume by them individually?
c)Find real life 10 distributor name who sold petrol in the least amount.
d)List all distributors who have this difference, along with the year and the difference which they have in that year.
QUESTION 2: Olympics use case
Start by creating a table with required data types and load the given dataset.
a)Using the dataset list the total number of medals won by each country in swimming.
b)Display real life number of medals India won year wise.
c)Find the total number of medals each country won display the name along with total medals.
d)Find the real life number of gold medals each country won.
e)Which country got medals for Shooting, year wise classification?
### QUESTION 3: Movies use case. a)Create 3 tables called movies, ratings and users. Load the data into tables.
Start by creating three different tables for Movies, Users and Ratings.
b)For movies table:
–List all movies with genre of movie is “Action” and “Drama”
c)For Ratings table: –List movie ids of all movies with rating equal to 5.
d)Find top 11 average rated "Action" movies with descending order of rating.
Link for all the commands used https://github.com/PallaviArikatla/Big-Data-Programming/tree/master/ICP_4/commands