K Means Clustering - utkaln/machine-learning GitHub Wiki

  • Algorithm that works with just the x side of the data label and no Y given
  • Choose a number of cluster that makes the most sense for the business decision (such as: demographics, type of disease, group of retail)
  • The steps of K means works as follows -
    • Step 1: Randomly Initialize Cluster Centroid
    • Step 2: Allocate data points closest to Type of Cluster or Cluster Index
    • Step 3: Calculate the Means of the data points and reassign the Cluster Centroid to the mean value
    • Repeat the above steps for a number of iterations, to reduce the cost Function (prefer to stay under 100 for optimal)

Cost Function aka Distortion

  • There is a no specific cost function is calculated for K means computation, as with each iteration it is naturally trying to reduce the cost