ICP 4 - Gnkhakimova/CS5590-BigData GitHub Wiki
- Create Hive Tables and Perform Queries for Use Case based on Petrol Data
- Create Hive Tables and Perform Queries for Use Case based on Olympics Data
- Create Hive Tables and Perform Queries for Use Case based on Movielens
- Oracle Virtual Box
- Cloudera
- Terminal
- Hive
For this task we had to perform different operations on tables using Hive.
1. Input files
Download input tables. Uploaded input to newly created Hive tables Petrol, Olympics, Movies, Users, Ratings.
2. Implementation
Task 1
Perform operations on Petrol dataset. List all distributors who have this difference, along with the year and the difference which they have in that year.
Task 2
Which country got medals for Shooting, year wise classification.
Task 3
1.Create 3 tables called movies, ratings and users. Load the data into tables.
2.For movies table:–List all movies with genre of movie is “Action” and “Drama”
3.For Ratings table:–List movie ids of all movies with rating equal to 5.4.Find top 11 average rated "Action" movies with descending order of rating.
–( Hint: Need to perform join operation on Movies and Ratings table)
3. Bonus
List all the movies with its genre where the movie genre is Action or Drama and the average movie rating is in between 4.4 -4.9 and only the male users rate the movie