Spark ICP7 - neerajpadarthi/Big-Data-Programming GitHub Wiki

Name : Neeraj Padarthi

Class ID: 19

Spark ICP : 7

Objective

  • Perform classification using Naïve Bayes
  • Perform classification using Decision Tree
  • Perform classification using Random Forest
  • Perform Clustering using K-means
  • Perform Regression using Linear Regression
  • Perform Regression using Logistic Regression

Approach for Classification Models

Load data into a dataset and selecting label and class

  • Loading the libraries and file
  • Casting the column datatypes
  • creating the features and output column

  • Fitting and transforming the data on the model and printing accuracy of the model of Naïve Bayes

  • Fitting and transforming the data on the model and printing accuracy of the model of Decision Tree

  • Fitting and transforming the data on the model and printing accuracy of the model of Random Forest

Approach for Clustering Method - K means

Load data into a dataset and selecting label and class

  • Loading the libraries and file
  • Casting the column datatypes
  • creating the features and output column
  • Fitting and transforming the data on the model

Approach for Reg Methods - Linear Regression

Load data into a dataset and selecting label and class

  • Loading the libraries and file
  • Casting the column datatypes
  • creating the features and output column

  • Fitting and transforming the data on the model and printing accuracy of the model of Linear Regression

Approach for Reg Methods - Logistic Regression

Load data into a dataset and selecting label and class

  • Loading the libraries and file
  • Casting the column datatypes
  • creating the features and output column

  • Fitting and transforming the data on the model and printing accuracy of the model of Logistic Regression