Autocycler helper - rrwick/Autocycler GitHub Wiki
Autocycler helper is a wrapper for various long-read assemblers, providing a consistent command-line interface and standardised outputs. It is an optional tool: not required for running Autocycler, but helpful for scripting and automation. By standardising how each assembler is invoked, it simplifies the process of generating input assemblies.
Autocycler helper replaces the Bash helper scripts used in Autocycler v0.4.0 and earlier. If you prefer to use those scripts (e.g. because they're easier to modify), you can find them in this earlier commit from the Autocycler repository.
This is the only Autocycler subcommand that relies on external tools, making it somewhat fragile. It may fail to run if the assemblers it depends on are not the versions it was developed for. I will do my best to keep it compatible with current tool versions. As of its initial release (June 2025), it works with the latest versions of all supported tools.
Each assembler is run in the same way:
autocycler helper canu --reads subsampled_reads/sample_01.fastq --out_prefix canu_01 --threads 16 --genome_size 5.5m
autocycler helper flye --reads subsampled_reads/sample_01.fastq --out_prefix flye_01 --threads 16 --genome_size 5.5m
autocycler helper raven --reads subsampled_reads/sample_01.fastq --out_prefix raven_01 --threads 16 --genome_size 5.5m
This allows a Bash loop over assemblers:
for assembler in canu flye metamdbg miniasm necat nextdenovo plassembler raven; do
autocycler helper "$assembler" --reads subsampled_reads/sample_01.fastq --out_prefix "$assembler"_01 --threads 16 --genome_size 5.5m
done
Autocycler helper can also estimate genome size via Raven:
autocycler helper genome_size --reads ont.fastq.gz --threads 16
Usage: autocycler helper [OPTIONS] --reads <READS> <TASK>
Arguments:
<TASK> Task (required positional) [possible values: genome_size, canu, flye, lja, metamdbg, miniasm,
myloasm, necat, nextdenovo, plassembler, raven, redbean]
Options:
-r, --reads <READS> Input long reads in FASTQ format (required)
-o, --out_prefix <OUT_PREFIX> Output prefix (required for all tasks except genome_size)
-g, --genome_size <GENOME_SIZE> Estimated genome size (required for some tasks)
-t, --threads <THREADS> Number of CPU threads [default: 8]
-d, --dir <DIR> Working directory [default: use a temporary directory]
--read_type <READ_TYPE> Read type [default: ont_r10] [possible values: ont_r9, ont_r10,
pacbio_clr, pacbio_hifi]
--min_depth_abs <MIN_DEPTH_ABS> Exclude contigs with read depth less than this absolute value
--min_depth_rel <MIN_DEPTH_REL> Exclude contigs with read depth less than this fraction of the
longest contig's depth
--args <ARGS>... Additional arguments for the task
-h, --help Print help
-V, --version Print version
- The
genome_size
task is unique: it produces no output files (therefore does not require--out_prefix
) and prints a number to stdout. - All assemblers produce a FASTA file named
<prefix>.fasta
. If available, GFA and log files are saved as<prefix>.gfa
and<prefix>.log
. - The
--out_prefix
value can include directories, which will be created if they don't exist. - By default, a temporary working directory is used and deleted when Autocycler helper finishes. If you provide
--dir
, it will be used instead and not deleted. - Some assemblers require a genome size estimate to be provided via
--genome_size
(see table below). For assemblers that do not need a genome size estimate, this parameter will be ignored. - Similarly, not all assemblers use
--read_type
(see table below), and it is ignored by those that don't. - Assemblers that report per-contig read depth (see table below) support filtering with
--min_depth_abs
and--min_depth_rel
. This is especially useful for metagenome assemblers like metaMDBG and Myloasm. - All assemblers work with ONT R10 and PacBio HiFi reads. Some (LJA, Myloasm and metaMDBG) may work poorly or not at all with ONT R9 or PacBio CLR reads due to their lower accuracy.
- Extra arguments can be passed via
--args
. E.g.autocycler helper flye --args '-i 2 -m 1000' ...
For multi-command workflows, only the first command receives them. - For Canu assemblies, additional processing is applied: contigs labelled as 'bubble' or 'repeat' are removed, and circular contigs are trimmed to remove their start-end overlap.
- Plassembler requires a database which can be in
$CONDA_PREFIX/plassembler_db
or set with$PLASSEMBLER_DB
. - Circular Plassembler contigs are randomly reoriented because Autocycler benefits when circular sequences have different starting points, unlike Plassembler which standardises them.
- stderr is printed to the terminal. stdout is printed too, unless used for file output.
- Output files are overwritten if they already exist.
- If an assembler fails or produces an empty assembly, Autocycler helper will delete the empty FASTA file.
Task | Assembly method | Outputs | Requires‑‑genome_size ? |
Uses‑‑read_type ? |
Per-contig depths? |
---|---|---|---|---|---|
genome_size |
Raven | none | n/a | no | n/a |
canu |
Canu | FASTA, log | yes | yes | yes |
flye |
Flye | FASTA, GFA, log | no | yes | yes |
lja |
LJA | FASTA, GFA, log | no | no | no |
metamdbg |
metaMDBG | FASTA, log | no | yes | yes |
miniasm |
miniasm+ Minipolish |
FASTA, GFA | no | yes | yes |
myloasm |
Myloasm | FASTA, GFA, log | no | yes | yes |
necat |
NECAT | FASTA | yes | no | no |
nextdenovo |
NextDenovo+ NextPolish |
FASTA, log | yes | yes | no |
plassembler |
Plassembler | FASTA, GFA, log | no | yes | no |
raven |
Raven | FASTA, GFA | no | no | no |
redbean |
wtdbg2+ wtpoa-cns |
FASTA | yes | yes | no |