Pseudo code - MatteoDJon/CloudProgrammingTonellotto GitHub Wiki

We will write a program to perform the k-means algorithm. The map and reduce functions we will implement are the following:

Algorithm 1: The Map Function

for each centroid in centroids
compute the distance between the point and the centroid
update the minDistance and clusterIdMinDistance if needed

return (key,value) pair where the key is the clusterIdMinDistance and the value a couple (point,1)

Algorithm 2: The Reduce Function

for each key(clusterId)
sum up all the points, component by component
sum up all the ones, obtaining the number of points assigned to the cluster

return (clusterId,(sumPoints,numberPoints))

The map function will produce key, value pairs from the input data as it is described in Algorithm 1. The reduce function uses the output of the map function, performs the calculations and produces key,value pairs as described in Algorithm 2.

⚠️ **GitHub.com Fallback** ⚠️