Exam 1 Part 2: Youtube - gabriellawillis/BigData GitHub Wiki

2. Use Case: Implement MapReduce algorithm to perform analysis on YouTube dataset.

Technologies Used:

MapReduce:

Find out what are the top 5 categories with maximum number of videos uploaded.

Create Mapper and Reducer Code in order to find the top 5 videos of the set. This will be done through iterations. VM Screenshot

Executing after turning java file into jar file

Command: hadoop jar /home/cloudera/Desktop/top5.jar Top5_categories input/youtubedata.txt output

Output for Top 5

Command: hadoop fs -cat output/part-r-00000

Find the top 10 rated videos on YouTube.

Create Mapper and Reducer Code

Execution after turning into jar file:

Command: hadoop jar /home/cloudera/Desktop/Video_rating.jar Video_rating input/youtubedata.txt output

Output for Video Rating

Command: hadoop fs -cat output/part-r-00000