Module 1: ICP #4 - VidyullathaKaza/BigData_Programming_Spring2020 GitHub Wiki

Problem Statement:

Create Hive Tables and Perform Queries.

Procedure:

Part 1: Based on petrol Data Set

Run the Hive user interface using the following command.

Create the Table using following command

Load the data using following command

We use the following query for total amount of petrol in volume sold by every distributor. The total for all the volume sold is calculated based on distributor name.

In the next query we are displaying the top 10 distributors ID’s for selling petrol and also display the amount of petrol sold in volume by them individually. We are filtering out the top 10 distributors ID based on the vol_Out.

In the next query we are displaying real life 10 distributor name who sold petrol in the least amount.We are filtering out the top 10 distributors ID based on the vol_Out.

The constraint to this query is the difference between volume IN and volume Out is illegal in real life if greater than 500. As we see all distributors are receiving patrols on every next cycle.We are filtering out the Distributor ID based on the difference between Vol_IN and Vol_Out.

Part 2: Based on Olympic Data set

We use the following commands to create the table and load the data from the given data set for Olympics data.

The first query is used to list the total number of medals won by each country in swimming. We are taking the data whose sport is swimming and calculating the sum of medals won by country.

The second query is used to display real life number of medals India won year wise.We are taking the medals won by India and arranging them year by year and showing the total umber of medals won in that year.

The third query is used to show the total number of medals each country won display the name along with total medals. We are taking medals won by each country and calculating the total.

The fourth query is used to give us the real life number of gold medals each country won. We are taking Gold medal won by each country and display them country wise.

The last query is used to show country that got medals for Shooting by year wise classification. We take out the country who participated in shooting and display the year it won the medal in.

Part 3: Based on Movie Data Set

We created 3 tables Movies, Ratings, Users and load data into them.

We display the names of the movies which come under the genre Action and Drama. We showing the movies whose genre consists of both Action and Drama.

We give the Movie Id's of the movies whose rating is 5.

We Give the top 11 Action Movie names whose ratings are average based. By using the JOIN operater we are joining movies table on ratings table to give the movie names.

Bonus Question

Executed the bonus question using the following query

Output is as follows

Learning Outcomes

  1. Understood the importance of Hive.
  2. HiveQL is very similar to SQL