ICP6 - PardhaSaradhi74/Python GitHub Wiki
NAME
Ramineni,Pardha Saradhi
Class ID
38
Programming elements:
KMeans Clusteringand Data Analysis
In class programming:
1.Apply K means clustering in this data set provided below:
https://umkc.box.com/s/a9lzu9qoqfkbhjwk5nz9m6dyybhl1wqy
Remove any null values by the mean.
Imported basic libraries required and read dataset using pandas read_csv then read all columns to X and filled null values with mean applied all over columns using lambda function.Screenshot is attached below.
Use the elbow method to find a good number of clusters with the KMeans algorithm
Used Elbow method to determine number of clusters as we can see in diagram below that curve started linear from point 3 so we considered number of clusters
2.Calculate the silhouette score for the above clustering3.
Answer))
In this applied K-means algorithm on the data read and calculated Silhouette score where it shows 0.46. Silhouette score is defined from range [-1,1].
Applied feature scaling so that all values are in same range without variables dominated by other. Then applied again K-means algorithm on the scaled data and checked Silhouette score we can observe that there is decrease in the score after applying scaling and screenshot is attached below.