Running on a Cluster - biologyguy/RD-MCL GitHub Wiki

Distributing RD-MCL on a cluster

RD-MCL will parallelize creation of all-by-all graphs while searching MCL parameter space. Once a graph has been created it is saved in a database, thus preventing repetition of the 'hard' work if/when the same cluster is identified again at a later time. This means that the computational burden of a given run will tend to be high at the beginning of that run and decrease with time.

To spread the work out across multiple nodes during the 'hard' part, launch workers with the launch_worker script bundled with RD-MCL:

$: launch_worker --workdb <path/to/desired/directory>

By default, launch_worker will use all of the cores it can find, so either sequester the entire node or pass in the --max_cpus flag to restrict it. I have run as many as 100 workers at a time, but be aware that this sort of pressure can lead to some instability (i.e., lost jobs from the queue and frozen master threads). Twenty workers is usually safe.

Next, launch RD-MCL with the --workdb flag set to the same path you specified for launch_worker:

$: rdmcl --workdb <path/to/same/directory/as/launch_worker>

RD-MCL will now send its expensive all-by-all work to a queue and wait around for one of the workers to do the calculations. You can keep track of how busy the workers are by running the monitor script in the same directory as the workers:

$: monitor_dbs

Press return to terminate.
#Master  AveMhb   #Worker  AveWhb   #queue   #subq   #proc   #subp   #comp   #HashWait #IdWait  ConnectTime
29       19.0     16       51.0     1        362     22      12      29      25        25       0.01

Also, you can send an arbitrary number of RD-MCL jobs to the same worker pool, no problem.