Eight Simple Rules for Designing Multithreaded Applications - fmxy/k-means GitHub Wiki

Following Clay Breshears 'The Art of Concurrency'

Rule 1: Identify Truly Independent Computations

  • Centroid Computatation: If all data points of a cluster are known, the sum (parallel reduction) can and therefore the centroid of a cluster can be calculated separately (by division)

  • Distance Calculation: Each point's distance to cluster centroids can be calculated separately

  • Point Assignment: Each point can be assigned to the cluster containing the closest cluster centroid separately

  • BUT: Synchronization step

For problems, see chapter 11.1, 'Structured Parallel Programming' by McCool, Robison, Reinders

Rule 2: Implement Concurrency at the Highest Level Possible

  • bottom-up/top-down
  • computational hotspots: distance calculation
  • Highest level: We want to have separate test runs, so we always consider one data points file and run strategy to take advantage of all cores, so we won't parallelize here already.
  • Next highest level could be to split the list of points into n sub-lists(e.g. by number of cores). Not splitting the list at all equal the sequential version.
  • Least highest level would be to open up a thread for each point in the list (as it is done in the first version of my implementation)

-> points should be split into n equally sized sub-lists and processed by n threads -> sync step needed in order to update centroids

Rule 3: Plan Early for Scalability to Take Advantage of Increasing Numbers of Cores

  • plan for the use of extra threads on extra cores

  • If the amount of sub-lists (see rule 2) is to be decided by the amount available cores, the amount of threads is also decided by that number.

  • If the amount of sub-lists is to be decided by the amount of clusters (k), there can't be an adjustment in number of threads for workstations with more cores.

-> Number of available cores seems to be the best solution to this problem

Rule 4: Make Use of Thread-Safe Libraries Whereever Possible

  • 'it's never a good idea to "reinvent the wheel"' -> Use opencsv CSVParser

  • check thread-safety of used libraries

    • -> reading the csv file happens before the concurrent algorithm starts and is not planned to be done in parallel
    • -> libraries used in the calculation of the clusters have to be checked (mostly set-based libraries)

Rule 5: Use the Right Threading Model

  • OpenMP recommendation
  • akka.io

Rule 6: Never Assume a Particular Order of Execution

  • 'execution order of threads is non-deterministic' !
  • 'Don't try to enforce a particular order of execution unless it its absolutely necessary.'

Rule 7: Use Thread-Local Storage Whenever Possible or Associate Locks to Specific Data

  • 'Synchronization is a neccessary evil'
    • -> keep it to a minimum by using thread-local storage shared updates as infrequently as possible (centroid update: is necessary after every iteration)
    • -> thread-local storage APIs Be careful with locks

Rule 8: Dare to Change the Algorithm for a Better Chance of Concurrency

  • -> interesting example of Matrix Multiplication Divide & Conquer Method 'When the sub-matrices achieve a given size, switch to a serial algorithm.'

  • other approach: divide by rows/columns/blocks

-> divide list of points; note that k-means requires intermediate synchronization and matrix multiplication only has one merge step