ICP2_7 - Hiresh12/Big-Data-Programming GitHub Wiki

Spark ICP : 7

Objective

  • Perform classification using Naïve Bayes
  • Perform classification using Decision Tree
  • Perform classification using Random Forest
  • Perform Clustering using K-means
  • Perform Regression using Linear Regression
  • Perform Regression using Logistic Regression

Approach for Classification Models

  • Load data into a dataset and selecting label and class
  • Loading the libraries and file
  • Casting the column datatypes
  • creating the features and output column

Fitting and transforming the data on the model and printing accuracy of the model of Naïve Bayes

Fitting and transforming the data on the model and printing accuracy of the model of Decision Tree

Fitting and transforming the data on the model and printing accuracy of the model of Random Forest

Approach for Clustering Method - K means

  • Load data into a dataset and selecting label and class
  • Loading the libraries and file
  • Casting the column datatypes
  • creating the features and output column
  • Fitting and transforming the data on the model

Approach for Reg Methods - Linear Regression

  • Load data into a dataset and selecting label and class
  • Loading the libraries and file
  • Casting the column datatypes
  • creating the features and output column

Approach for Reg Methods - Logistic Regression

  • Load data into a dataset and selecting label and class
  • Loading the libraries and file
  • Casting the column datatypes
  • creating the features and output column