Lab Assignment I - ROHITHKUMARN/CS5542-Bigdata-LabAssignments GitHub Wiki
Lab 1 Assignment
Name: Nagulapati Rohith Kumar, Class ID: 16
Name: Nageswara Rao Nandigam, Class ID: 17
source code : source code
Question 1
1.Spark ProgrammingDownload the movielens dataset from the following link -
https://grouplens.org/datasets/movielens/100k/
Please use the u.data file in the dataset as your input file.The input file contains 100000 movie ratings by 943 users on 1682 items. The data is randomly ordered. This is a tab separated list of "user id | item id | rating | timestamp". Write Spark Transformations and Actions to find the users who have rated more than 25 items.Input: u.data file from the movielens dataset.Output: List of (userid of the user who has rated more than 25 items, Number of items that he has rated)
Source Code:
Output:
Question 2 and 3:
- Create GitHub Account. Create a repository in remote Github. Clone it to the local machine.
Steps:
-
created the github account
-
created a repository with name "CS5542-Bigdata-LabAssignments" and cloned to local machine
- Create 2 (Source and Documentation) directories in local GitHub.
Steps:
-
Created two folders source and documentation
-
Pushed it to Remote Repository
- Create ZenHub Tool Account. Create a board, 3 milestones, at least 5 issues and show the analytics graph.
Steps:
****1. created five issues
- created 3 milestones
-
Assigned five issues to 3 milestones
-
Draw burndown chart