Module 2: ICP #6 - SnehaMishra28/BigData_Programming_Summer2018 GitHub Wiki

Team: 12
Professor: Yugyung Lee

Name: Sneha Mishra
Class ID: 11
Email: [email protected]
MyGitHub

Technical Partner:
Name: Aditya Soman
Class ID: 19
Email: [email protected]
GitHub

Objective

Understanding of Apache Spark MLIB. MLlib is Apache Spark's scalable machine learning library, with APIs in Java, Scala, Python, and R. Basic understanding of Clustering, Classification, Regression and Recommendation.

Features

Use of Algorithms such as:

Naïve Bayes.
Decision Tee.
Random Tree.
KMeans3.
Linear Regression.
Logistic Regression.
CollaborativeFiltering : Alternating Least Square.

Steps:

Part 1: Clustering

Part 2: Classification

This task contains working on 3 algorithms namely:

1. Naïve Bayes:

It is a classification technique based on Bayes’ theorem. Naïve Bayes classifier assumes that the presence of a particular feature in a class is unrelated to the presence of any other feature.