ICP 6 - Saiaishwaryapuppala/CSEE5590_python_Icp GitHub Wiki
Python and Deep Learning: Special Topics
Rajeshwari Sai Aishwarya Puppala
Student ID: 16298162
Class ID: 35
In class programming: 6
Objectives:
1.Apply K means clustering in this data set provided below:
https://umkc.box.com/s/a9lzu9qoqfkbhjwk5nz9m6dyybhl1wqy
Remove any null values by the mean. Use the elbow method to find a good number of clusters with the KMeans algorithm
Calculate the silhouette score for the above clustering
Apply PCA on the same dataset.
Clustering
- Import the required libraries like seaborn, pandas, matplotlib, kmeans, pathlib
- Load the data set into a data frame with the help of path lib.
- For better information print the tenure counts
- Now find the null counts in the data set.
- After printing the null counts we will get to know that Minimum_Payments and Credit_limit have the null value.
- Replace it with the mean of the respective feature.
- Divide the data set into train and test.
- Now with the elbow method, we will find the optimal number of cluster
- As you can see the plot, 3 is the optimal number of clusters.
- Now calculate the silhouette_score for 3 clusters by doing KMeans for 3 clusters.
- Apply the Standardization and apply PCA for the results obtained from standardization.
- For the results obtained from PCA, check if the score is improving or not.