Spark ICP7 - neerajpadarthi/Big-Data-Programming GitHub Wiki

`Name : Neeraj Padarthi`

`Class ID: 19`

`Spark ICP : 7`

Objective

Perform classification using Naïve Bayes
Perform classification using Decision Tree
Perform classification using Random Forest
Perform Clustering using K-means
Perform Regression using Linear Regression
Perform Regression using Logistic Regression

Approach for Classification Models

Load data into a dataset and selecting label and class

Loading the libraries and file
Casting the column datatypes
creating the features and output column

Fitting and transforming the data on the model and printing accuracy of the model of Naïve Bayes

Fitting and transforming the data on the model and printing accuracy of the model of Decision Tree

Fitting and transforming the data on the model and printing accuracy of the model of Random Forest

Approach for Clustering Method - K means

Load data into a dataset and selecting label and class

Loading the libraries and file
Casting the column datatypes
creating the features and output column
Fitting and transforming the data on the model

Approach for Reg Methods - Linear Regression

Load data into a dataset and selecting label and class

Loading the libraries and file
Casting the column datatypes
creating the features and output column

Fitting and transforming the data on the model and printing accuracy of the model of Linear Regression

Approach for Reg Methods - Logistic Regression

Load data into a dataset and selecting label and class

Loading the libraries and file
Casting the column datatypes
creating the features and output column

Fitting and transforming the data on the model and printing accuracy of the model of Logistic Regression