K Means Clustering - utkaln/machine-learning GitHub Wiki
- Algorithm that works with just the x side of the data label and no Y given
- Choose a number of cluster that makes the most sense for the business decision (such as: demographics, type of disease, group of retail)
- The steps of K means works as follows -
- Step 1: Randomly Initialize
Cluster Centroid
- Step 2: Allocate data points closest to Type of Cluster or
Cluster Index
- Step 3: Calculate the
Means
of the data points and reassign the Cluster Centroid
to the mean value
- Repeat the above steps for a number of iterations, to reduce the cost Function (prefer to stay under 100 for optimal)
Cost Function aka Distortion
- There is a no specific cost function is calculated for K means computation, as with each iteration it is naturally trying to reduce the cost