Module 1: Lab #2 - SnehaMishra28/BigData_Programming_Summer2018 GitHub Wiki
Team: 12
Professor: Yugyung Lee
Name: Sneha Mishra
Class ID: 11
Email: [email protected]
MyGitHub
Technical Partner 1:
Name: Aditya Soman
Class ID: 19
Email: [email protected]
GitHub
Technical Partner 2:
Name: Bilal Mustafa
Class ID: 13
Email: [email protected]
[GitHub]
YouTube Link explaining the Lab work can be found here
The report for the Lab work is here
Objective
Understanding Hive and solr better and perform complex intuitive queries on the created database as well as collection
Features
- Create a Hive Table including Complex Data Types
- Create a Solr Collection including our own Field Types
- Use built-in functions in the queries
- Perform 10 intuitive queries in Database
- Record the time execution for the queries
Steps:
Part 1: HIVE USECASE
Hive Dataset:
here
The given Dataset needs to be modified and divided into appropriate columns. The updated dataset can be foundTable created:
Query 1: Get City with max "Excellent" Rating
Query:
Select max(address.city) from tomato where rating.text = "Excellent";
Output:
Query 2: Get count of cities that use Indian Rupees
Query:
Select count(address.city) from tomato where cost.currency = "Indian Rupees(Rs.)";
Output:
Query 3: Get the Costliest Restaurant Details
Query: Select name, address.address, address.city, cost.avg_cost from zomato sort by avg_cost desc limit 1;
Output:
Query 4: Get the restaurant with the most number of votes
Query:
Select name, address.address, address.city, rating.votes, cost.avg_cost from zomato sort by votes desc limit 1;
Output:
Query 5: Get the restaurant with the best customer satisfaction
Query:
Select name, addresss.address, address.city, cost.avg_cost, rating.text, cuisines from zomato where policy.table_booking = "Yes" and policy.online_delivery="Yes" and policy.delv_now="Yes";
Output:
Query 6: Get the Top 10 most recurring restaurants
Query:
Select name, count(*) as count from zomato group by name order by count desc limit 10;
Output:
Query 7: Get all restaurants that offer French Cuisine
Query:
Select name, address.address, address.city, cost.avg_cost, rating.text from zomato where locate("French", cuisines)>0;
Output:
Query 8: Get Top 10 restaurants with "Excellent" rating and sort them by descending order of votes
Query:
Select name, address.address, address.city, cuisines, rating.votes from zomato where rating.text = "Excellent" sort by votes desc limit 10;
Output:
Query 9: Get Top 10 restaurants with high price range and "Excellent" rating and sort them by descending order of average cost
Query:
Select name, address.address, address.city, cuisines, cost.avg_cost from zomato where cost.price_range > 3 and rating.text = "Excellent" order by avg_cost desc limit 10;
Output:
Query 10: Get Top 10 restaurants with high price range and "Poor" rating and sort them by descending order of average cost
Query:
Select name, address.address, address.city, cuisines, cost.avg_cost from zomato where cost.price_range > 3 and rating.text = "Poor" order by avg_cost desc limit 10;
Output:
Part 2: SOLR USECASE
here
The dataset needs to be updated. Updated dataset can be foundUpdate the Schema.xml file:
The schema needed to be updated to be used in Solr. The updated schema can be found here
Document Created:
The document created after importing the dataset is
Query 1: Get all restaurants with Indian Cuisine
Query 2: Get the restaurant with the best customer satisfaction
Query 3: Get the restaurants with "Poor" ratings but are expensive
Query 4: Get the restaurant with "Excellent" ratings sorted in descending order of Votes
Query 5: Get the Top 10 restaurants which don'92t have "Excellent" rating but are expensive
Query 6: Get the Top 10 costliest restaurants in the Northern Hemisphere
Query 7: Get the Top 10 most popular restaurants
Query:
Output:
Query 8: Get Top 10 most commonly used currencies
Query:
Output:
Query 9: Get the cities with the most number of restaurants
Query:
Output:
Query 10: Restaurants with the same locality
Query:
Output:
References:
Zomato Restaurants Data
https://www.kaggle.com/shrutimehta/zomato-restaurants-data
Super Heros Dataset
https://www.kaggle.com/claudiodavi/superhero-set/data
Google Job Skills Dataset
https://www.kaggle.com/niyamatalmass/google-job-skills/data