Spark ICP7 - neerajpadarthi/Big-Data-Programming GitHub Wiki
Name : Neeraj Padarthi
Class ID: 19
Spark ICP : 7
Objective
- Perform classification using Naïve Bayes
- Perform classification using Decision Tree
- Perform classification using Random Forest
- Perform Clustering using K-means
- Perform Regression using Linear Regression
- Perform Regression using Logistic Regression
Approach for Classification Models
Load data into a dataset and selecting label and class
- Loading the libraries and file
- Casting the column datatypes
- creating the features and output column



- Fitting and transforming the data on the model and printing accuracy of the model of Naïve Bayes

- Fitting and transforming the data on the model and printing accuracy of the model of Decision Tree

- Fitting and transforming the data on the model and printing accuracy of the model of Random Forest

Approach for Clustering Method - K means
Load data into a dataset and selecting label and class
- Loading the libraries and file
- Casting the column datatypes
- creating the features and output column
- Fitting and transforming the data on the model


Approach for Reg Methods - Linear Regression
Load data into a dataset and selecting label and class
- Loading the libraries and file
- Casting the column datatypes
- creating the features and output column

- Fitting and transforming the data on the model and printing accuracy of the model of Linear Regression


Approach for Reg Methods - Logistic Regression
Load data into a dataset and selecting label and class
- Loading the libraries and file
- Casting the column datatypes
- creating the features and output column

- Fitting and transforming the data on the model and printing accuracy of the model of Logistic Regression

