Eight Simple Rules for Designing Multithreaded Applications - fmxy/k-means GitHub Wiki
Following Clay Breshears 'The Art of Concurrency'
Rule 1: Identify Truly Independent Computations
-
Centroid Computatation: If all data points of a cluster are known, the sum (parallel reduction) can and therefore the centroid of a cluster can be calculated separately (by division)
-
Distance Calculation: Each point's distance to cluster centroids can be calculated separately
-
Point Assignment: Each point can be assigned to the cluster containing the closest cluster centroid separately
-
BUT: Synchronization step
For problems, see chapter 11.1, 'Structured Parallel Programming' by McCool, Robison, Reinders
Rule 2: Implement Concurrency at the Highest Level Possible
- bottom-up/top-down
- computational hotspots: distance calculation
- Highest level: We want to have separate test runs, so we always consider one data points file and run strategy to take advantage of all cores, so we won't parallelize here already.
- Next highest level could be to split the list of points into n sub-lists(e.g. by number of cores). Not splitting the list at all equal the sequential version.
- Least highest level would be to open up a thread for each point in the list (as it is done in the first version of my implementation)
-> points should be split into n equally sized sub-lists and processed by n threads -> sync step needed in order to update centroids
Rule 3: Plan Early for Scalability to Take Advantage of Increasing Numbers of Cores
-
plan for the use of extra threads on extra cores
-
If the amount of sub-lists (see rule 2) is to be decided by the amount available cores, the amount of threads is also decided by that number.
-
If the amount of sub-lists is to be decided by the amount of clusters (k), there can't be an adjustment in number of threads for workstations with more cores.
-> Number of available cores seems to be the best solution to this problem
Rule 4: Make Use of Thread-Safe Libraries Whereever Possible
-
'it's never a good idea to "reinvent the wheel"' -> Use opencsv CSVParser
-
check thread-safety of used libraries
- -> reading the csv file happens before the concurrent algorithm starts and is not planned to be done in parallel
- -> libraries used in the calculation of the clusters have to be checked (mostly set-based libraries)
Rule 5: Use the Right Threading Model
- OpenMP recommendation
- akka.io
Rule 6: Never Assume a Particular Order of Execution
- 'execution order of threads is non-deterministic' !
- 'Don't try to enforce a particular order of execution unless it its absolutely necessary.'
Rule 7: Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data
- 'Synchronization is a neccessary evil'
- -> keep it to a minimum by using thread-local storage shared updates as infrequently as possible (centroid update: is necessary after every iteration)
- -> thread-local storage APIs Be careful with locks
Rule 8: Dare to Change the Algorithm for a Better Chance of Concurrency
-
-> interesting example of Matrix Multiplication Divide & Conquer Method 'When the sub-matrices achieve a given size, switch to a serial algorithm.'
-
other approach: divide by rows/columns/blocks
-> divide list of points; note that k-means requires intermediate synchronization and matrix multiplication only has one merge step