Implementing Map-Reduce Algorithm to find top 10 rated videos on youtube. I have worked on mapper class for this task.
Mapper Class :
Mapper class stores video rating in a variable and also stores their videoID.
Overriding map runs for every line and each line is converted to string and splits each record by using tab space.
Checks if the string length is greater than 6 and taking seventh column which is rating and sending the key value pairs as videoID and rating to reducer class.
Anmisha worked on reducer part and we integrated both the tasks and created a jar file to execute the task.
This is the driver class where map reduce classes are set.
Output
List of top 10 rated videos
Hive Use Case
Hive Table is created first to perform Queries. To Create a hive table below command is used :
create table Zomato(Restaurant_ID INT, Restaurant_Name STRING, Country_Code SMALLINT, City STRING, Address ARRAY, Locality ARRAY, Locality_Verbose ARRAY, Longitude FLOAT, Latitude FLOAT, Cuisines ARRAY, AverageCost_2 INT, Currency STRING,Has_Tablebooking STRING,Has_Onlinedelivery STRING, Isdelivering_now STRING, Switchtoorder_menu STRING, Price_range TINYINT, Aggregate_rating FLOAT, Rating_color STRING, Rating_text STRING, Votes INT)row format delimited fields terminated by ‘,’ collection items terminated by '#' stored as textfile tblproperties ("skip.header.line.count"="1");
Loading Data into Table :
load data local inpath '/home/cloudera/Desktop/zomato.csv' into table Zomato;
Queries Executed On table are shown below
Query -1 : To find restaurants whose currency matches with indian currency
select Restaurant_ID,Restaurant_Name,Currency from Zomato where Currency like 'Indian%'
Query -2 : To find numbers of restaurants in each countrycode
select Country_Code, count(Country_Code) from Zomato group by Country_Code;
Query -3 : To get top 10 restaurants who has online delivery option and also have highest rating
select Restaurant_ID,Restaurant_Name,Aggregate_rating from Zomato where Has_Onlinedelivery="Yes" order by Aggregate_rating desc limit 10;
Query -4 : Concatinate latitude and longitude for country code 166
select Restaurant_Name,concat(Latitude,",",Longitude) as location from Zomato where Country_Code=166 limit 10;
Query -5 : To find restaurants which offer more than 6 cuisines
select Restaurant_Name, size(Cuisines) as number of cuisines from Zomato where size(Cuisines)>6;
Query -6 : To find restaurants with american and italian cuisines
select Restaurant_Name,AverageCost_2 from Zomato where array_contains(Cuisines,"American","Italian") limit 10;
Query -7 : To find the average Rating for each country
select Country_Code,avg(Aggregate_rating) as Avg_country_rating from Zomato Cluster by Country_Code;
Query -8 : To find wordcount for rating text
select word,count(1) as cnt from (select explode(Rating_text) as word from Zomato) group by word order by cnt;
Query -9 : To find top 2 restaurants with highest votes for each Excellent and very good rating text
select Restaurant_ID,City from (select Restaurant_ID,City, votes from Zomato where Rating_text="Excellent" order by votes desc limit 2 UNION ALL select Restaurant_ID,City, votes from Zomato where Rating_text="Excellent" order by votes desc limit 2) as D;
Query -10 : To Display all cities for country code 1
select City from Zomato where Country_Code=1 sort by City;
Questions
The Main idea of project is to get top ten rated youtube videos using map reduce algorithm . Map reduce works with key value pairs and the main concept is splitting the data and getting key value pairs as videoID and video rating and passing it to reducer class which sorts all the inputs and gives the final output.The main idea of the Hive use case is to collect Zomato Restaurants information using hive Queries.
This project is used in real time to analyze huge data like getting top ten rated videos from all youtube videos. The hive Queries can be used in real time to collect some useful information like which restaurants have highest rating and which restaurants have more number of cuisines available.
I have worked on the mapper class funtion in the task2 second part and worked on hive Queries.
I did not face much challenges as the project work was similar to the icp's we did earlier.
Hive Queries was individual work and there wasn't much to integrate with other team members but for the map-reduce task I and anmisha implemented it together as i have worked on mapper class and she worked on reducer class.