ICP6 - PardhaSaradhi74/Python GitHub Wiki

NAME

Ramineni,Pardha Saradhi

Class ID

38

Programming elements:

KMeans Clusteringand Data Analysis

In class programming:

1.Apply K means clustering in this data set provided below:

https://umkc.box.com/s/a9lzu9qoqfkbhjwk5nz9m6dyybhl1wqy

Remove any null values by the mean.

Imported basic libraries required and read dataset using pandas read_csv then read all columns to X and filled null values with mean applied all over columns using lambda function.Screenshot is attached below.

Use the elbow method to find a good number of clusters with the KMeans algorithm

Used Elbow method to determine number of clusters as we can see in diagram below that curve started linear from point 3 so we considered number of clusters

2.Calculate the silhouette score for the above clustering3.

Answer))

In this applied K-means algorithm on the data read and calculated Silhouette score where it shows 0.46. Silhouette score is defined from range [-1,1].

Applied feature scaling so that all values are in same range without variables dominated by other. Then applied again K-means algorithm on the scaled data and checked Silhouette score we can observe that there is decrease in the score after applying scaling and screenshot is attached below.