launch_worker - biologyguy/RD-MCL GitHub Wiki

Sets up a queue and launches a worker

Workers will monitor a database for incoming information from RD-MCL. They will then use that information to calculate all-by-all similarity graphs, which is the most computationally expensive step in RD-MCL. These jobs are also split up into sub-jobs to maximize the utilization of a pool of workers.

Generalized usage

$: launch_worker <args>

If no arguments are passed in, the current working directory will be set up to administer the queue.

args: All flagged arguments are explained in detail below.

Arguments

-wdb, --workdb   ( path )

Specify the directory where SQLite queue databases will be set up and fed by RD-MCL (default=current working directory).

$: launch_worker -wdb '/home/rdmcl_workers'

-hr, --heart_rate   ( int )

Workers intermittently connect to the queue database to state that they are still alive. You can adjust how often this occurs in seconds (default=60).

$: launch_worker -hr 30

-mw, --max_wait   ( int )

RD-MCL also checks in with the queue database to state it is still alive. If a Worker does not receive a 'heartbeat' from an RD-MCL run after a set amount of time (in seconds), it will terminate to free up the node (default=600).

$: launch_worker -mw 1200

-dtw, --dead_thread_wait   ( int )

If an RD-MCL thread fails to check in with the queue database after a set amount of time (in seconds), the Workers will assume it is dead and remove it from the database (default=120).

$: launch_worker -dtw 240

-cpu, --max_cpus   ( int )

The number of cores a Worker is allowed to access can be throttled.

$: launch_worker -cpu 32

-js, --job_size   ( int )

When RD-MCL places a large job in the queue, a Worker will split it into sub-jobs. The size of these sub-jobs can be controlled by specifying a coffactor. Larger values will reduce the number of sub-jobs by increasing their size (default=300).

$: launch_worker -js 200

-log, --log

Workers print information about the jobs they are running to the console. By default, this information is dynamically updated on a single line. If you want to keep track of logging data, then set this flag to inject a line break between each bit of information printed.

$: launch_worker -log

-q, --quiet

Suppress all output

$: launch_worker -q

Example with expected output

$: launch_worker -wdb '/home/rdmcl_workers' -log

Starting Worker_14

Idle 87.36%
Idle 96.46%
Idle 98.16%
Running 508b1bba0757990f77b7618b69458531
Creating MSA (476 seqs)
Preparing 476 psipred dataframes
Trimal (476 seqs)
Updating 476 psipred dataframes
Preparing all-by-all data
Running all-by-all data (56 comparisons)
Processing final results
Running 8_37_508b1bba0757990f77b7618b69458531
Reading MSA (476 seqs)
Preparing 476 psipred dataframes
Preparing all-by-all data
Running all-by-all data (3056 comparisons)
Processing final results
Running 15_37_508b1bba0757990f77b7618b69458531
Reading MSA (476 seqs)
Preparing 476 psipred dataframes
Preparing all-by-all data
Running all-by-all data (3056 comparisons)
Processing final results
Running 22_37_508b1bba0757990f77b7618b69458531
Reading MSA (476 seqs)
Preparing 476 psipred dataframes
Preparing all-by-all data
Running all-by-all data (3056 comparisons)
Processing final results ...
⚠️ **GitHub.com Fallback** ⚠️