ICP 4 - bhargavi1411/BigDataProgramming GitHub Wiki

Name : Bhargavi Saipoojitha Chennupati

Class ID : 4

Topic :

Hadoop Dependent Query Based NOSQL Database Hive

Task 1 :

Create Hive Tables and Perform Queries for Use Case based on Petrol Data.

First we have created a table to store the petrol data using hive command 'create'

Then we have to load the data 'petrol.txt' into petrol table using command 'load' and by providing the local path.

Question 1 :

In real life what is the total amount of petrol in volume sold by every distributor?

Question 2 :

Which are the top 10 distributors ID’s for selling petrol and also display the amount of petrol sold in volume by them individually?

Question 3 :

Find real life 10 distributor name who sold petrol in the least amount.

Question 4 :

The constraint to this query is the difference between volumeIN and volumeOuTis illegal in real life if greater than 500. As we see all distributors are receiving patrols on every next cycle.List all distributors who have this difference, along with the year and the difference which they have in that year.

Task 2 :

Question 1 : Using the dataset list the total number of medals won by each country in swimming.

Question 2 : Display real life number of medals India won year wise.

Question 3 : Find the total number of medals each country won display the name along with total medals.

Question 4 : Find the real life number of gold medals each country won.

Question 5 : Which country got medals for Shooting, year wise classification?

Task 3 :

Question 1 :

Create 3 tables called movies, ratings and users. Load the data into tables.

Question 2 :

For movies table List all movies with genre of movie is “Action” and “Drama”

Question 3 :

For Ratings table List movie ids of all movies with rating equal to 5.

Question 4 :

Find top 11 average rated "Action" movies with descending order of rating.