Eight Simple Rules for Designing Multithreaded Applications - fmxy/k-means GitHub Wiki

Following Clay Breshears 'The Art of Concurrency'

Rule 1: Identify Truly Independent Computations

Centroid Computatation: If all data points of a cluster are known, the sum (parallel reduction) can and therefore the centroid of a cluster can be calculated separately (by division)
Distance Calculation: Each point's distance to cluster centroids can be calculated separately
Point Assignment: Each point can be assigned to the cluster containing the closest cluster centroid separately
BUT: Synchronization step

For problems, see chapter 11.1, 'Structured Parallel Programming' by McCool, Robison, Reinders

bottom-up/top-down
computational hotspots: distance calculation
Highest level: We want to have separate test runs, so we always consider one data points file and run strategy to take advantage of all cores, so we won't parallelize here already.
Next highest level could be to split the list of points into n sub-lists(e.g. by number of cores). Not splitting the list at all equal the sequential version.
Least highest level would be to open up a thread for each point in the list (as it is done in the first version of my implementation)

-> points should be split into n equally sized sub-lists and processed by n threads -> sync step needed in order to update centroids

plan for the use of extra threads on extra cores
If the amount of sub-lists (see rule 2) is to be decided by the amount available cores, the amount of threads is also decided by that number.
If the amount of sub-lists is to be decided by the amount of clusters (k), there can't be an adjustment in number of threads for workstations with more cores.

-> Number of available cores seems to be the best solution to this problem

'it's never a good idea to "reinvent the wheel"' -> Use opencsv CSVParser
check thread-safety of used libraries
- -> reading the csv file happens before the concurrent algorithm starts and is not planned to be done in parallel
- -> libraries used in the calculation of the clusters have to be checked (mostly set-based libraries)

'execution order of threads is non-deterministic' !
'Don't try to enforce a particular order of execution unless it its absolutely necessary.'

'Synchronization is a neccessary evil'
- -> keep it to a minimum by using thread-local storage shared updates as infrequently as possible (centroid update: is necessary after every iteration)
- -> thread-local storage APIs Be careful with locks

-> interesting example of Matrix Multiplication Divide & Conquer Method 'When the sub-matrices achieve a given size, switch to a serial algorithm.'
other approach: divide by rows/columns/blocks

-> divide list of points; note that k-means requires intermediate synchronization and matrix multiplication only has one merge step