Module 1: Lab #2 - SnehaMishra28/BigData_Programming_Summer2018 GitHub Wiki

Team: 12
Professor: Yugyung Lee

Name: Sneha Mishra
Class ID: 11
Email: [email protected]
MyGitHub

Technical Partner 1:
Name: Aditya Soman
Class ID: 19
Email: [email protected]
GitHub

Technical Partner 2:
Name: Bilal Mustafa
Class ID: 13
Email: [email protected]
[GitHub]

YouTube Link explaining the Lab work can be found here
The report for the Lab work is here

Objective

Understanding Hive and solr better and perform complex intuitive queries on the created database as well as collection

Features

  1. Create a Hive Table including Complex Data Types
  2. Create a Solr Collection including our own Field Types
  3. Use built-in functions in the queries
  4. Perform 10 intuitive queries in Database
  5. Record the time execution for the queries

Steps:

Part 1: HIVE USECASE

Hive Dataset:

The given Dataset needs to be modified and divided into appropriate columns. The updated dataset can be found here

Table created:

Query 1: Get City with max "Excellent" Rating

Query:

Select max(address.city) from tomato where rating.text = "Excellent";

Output:

Query 2: Get count of cities that use Indian Rupees

Query:

Select count(address.city) from tomato where cost.currency = "Indian Rupees(Rs.)";

Output:

Query 3: Get the Costliest Restaurant Details

Query: Select name, address.address, address.city, cost.avg_cost from zomato sort by avg_cost desc limit 1;

Output:

Query 4: Get the restaurant with the most number of votes

Query:

Select name, address.address, address.city, rating.votes, cost.avg_cost from zomato sort by votes desc limit 1;

Output:

Query 5: Get the restaurant with the best customer satisfaction

Query:

Select name, addresss.address, address.city, cost.avg_cost, rating.text, cuisines from zomato where policy.table_booking = "Yes" and policy.online_delivery="Yes" and policy.delv_now="Yes";

Output:

Query 6: Get the Top 10 most recurring restaurants

Query:

Select name, count(*) as count from zomato group by name order by count desc limit 10;

Output:

Query 7: Get all restaurants that offer French Cuisine

Query:

Select name, address.address, address.city, cost.avg_cost, rating.text from zomato where locate("French", cuisines)>0;

Output:

Query 8: Get Top 10 restaurants with "Excellent" rating and sort them by descending order of votes

Query:

Select name, address.address, address.city, cuisines, rating.votes from zomato where rating.text = "Excellent" sort by votes desc limit 10;

Output:

Query 9: Get Top 10 restaurants with high price range and "Excellent" rating and sort them by descending order of average cost

Query:

Select name, address.address, address.city, cuisines, cost.avg_cost from zomato where cost.price_range > 3 and rating.text = "Excellent" order by avg_cost desc limit 10;

Output:

Query 10: Get Top 10 restaurants with high price range and "Poor" rating and sort them by descending order of average cost

Query:

Select name, address.address, address.city, cuisines, cost.avg_cost from zomato where cost.price_range > 3 and rating.text = "Poor" order by avg_cost desc limit 10;

Output:

Part 2: SOLR USECASE

The dataset needs to be updated. Updated dataset can be found here

Update the Schema.xml file:

The schema needed to be updated to be used in Solr. The updated schema can be found here

Document Created:

The document created after importing the dataset is

Query 1: Get all restaurants with Indian Cuisine

Query 2: Get the restaurant with the best customer satisfaction

Query 3: Get the restaurants with "Poor" ratings but are expensive

Query 4: Get the restaurant with "Excellent" ratings sorted in descending order of Votes

Query 5: Get the Top 10 restaurants which don'92t have "Excellent" rating but are expensive

Query 6: Get the Top 10 costliest restaurants in the Northern Hemisphere

Query 7: Get the Top 10 most popular restaurants

Query:

Output:

Query 8: Get Top 10 most commonly used currencies

Query:

Output:

Query 9: Get the cities with the most number of restaurants

Query:

Output:

Query 10: Restaurants with the same locality

Query:

Output:

References:

Zomato Restaurants Data

https://www.kaggle.com/shrutimehta/zomato-restaurants-data

Super Heros Dataset

https://www.kaggle.com/claudiodavi/superhero-set/data

Google Job Skills Dataset

https://www.kaggle.com/niyamatalmass/google-job-skills/data

Seinfeld Chronicles Dataset

https://www.kaggle.com/thec03u5/seinfeld-chronicles/data