ICP 6 - Saiaishwaryapuppala/CSEE5590_python_Icp GitHub Wiki

Python and Deep Learning: Special Topics

Rajeshwari Sai Aishwarya Puppala

Student ID: 16298162

Class ID: 35

In class programming: 6

Objectives:

1.Apply K means clustering in this data set provided below:

https://umkc.box.com/s/a9lzu9qoqfkbhjwk5nz9m6dyybhl1wqy

Remove any null values by the mean. Use the elbow method to find a good number of clusters with the KMeans algorithm

Calculate the silhouette score for the above clustering

Apply PCA on the same dataset.

Clustering

  • Import the required libraries like seaborn, pandas, matplotlib, kmeans, pathlib
  • Load the data set into a data frame with the help of path lib.
  • For better information print the tenure counts
  • Now find the null counts in the data set.
  • After printing the null counts we will get to know that Minimum_Payments and Credit_limit have the null value.
  • Replace it with the mean of the respective feature.
  • Divide the data set into train and test.
  • Now with the elbow method, we will find the optimal number of cluster
  • As you can see the plot, 3 is the optimal number of clusters.
  • Now calculate the silhouette_score for 3 clusters by doing KMeans for 3 clusters.
  • Apply the Standardization and apply PCA for the results obtained from standardization.
  • For the results obtained from PCA, check if the score is improving or not.

Source Code

Output

Elbow Plot

Null counts and Silhouette_score before sandardization

After PCA and Silhouette_score after sandardization