Big_Data_Programming_ICP_4 - kusamdinesh/Big-Data-and-Hadoop GitHub Wiki

Hive is a data-warehousing system to store structured data on HDFS and by execution of Hadoop MapReduce plans provides an easy query of these data.

  • Create a table by name petrol and load data into the table.

1.1. Running the query to retrieve the total amount of petrol in volume sold by total every distributor.

1.2. Running the query to retrieve the top 10 distributors ID’s for selling petrol and also display the amount of petrol sold in volume by them individually.

1.3. Running a query to retrieve 10 distributor names who sold petrol in the least amount.

1.4. Running the query to List all distributors who have this difference, along with the year and the difference which they have in that year.(own query)

  • Create a table by the name olympic and load data into the table.

2.1. Using the dataset list the total number of medals won by each country in swimming.

2.2. Running a query to retrieve the number of medals India won year wise.

2.3. Running a query to retrieve the total number of medals each country won and displaying the name along with total medals.

2.4. Running a query to retrieve the number of gold medals each country won.

2.5. Running a query to retrieve the countries which got a medal for shooting by classifying them year wise.(Own query)

  • 3.1. Create tables for movies, rating and users; load them with data.

Movies

Rating

Users

  • 3.2. For movies table : List all movies with genre of movie is “Action” and “Drama”

  • 3.3. For Ratings table : List movie ids of all movies with rating equal to 5.

  • 3.4. Find top 11 average rated "Action" movies with descending order of rating.

  • Bonus.