Pseudo code - MatteoDJon/CloudProgrammingTonellotto GitHub Wiki
We will write a program to perform the k-means algorithm. The map and reduce functions we will implement are the following:
for each centroid in centroids
compute the distance between the point and the centroid
update the minDistance and clusterIdMinDistance if needed
return (key,value) pair where the key is the clusterIdMinDistance and the value a couple (point,1)
for each key(clusterId)
sum up all the points, component by component
sum up all the ones, obtaining the number of points assigned to the cluster
return (clusterId,(sumPoints,numberPoints))
The map function will produce key, value pairs from the input data as it is described in Algorithm 1. The reduce function uses the output of the map function, performs the calculations and produces key,value pairs as described in Algorithm 2.