Autocycler helper - rrwick/Autocycler GitHub Wiki

Basics

Autocycler helper is a wrapper for various long-read assemblers, providing a consistent command-line interface and standardised outputs. It is an optional tool: not required for running Autocycler, but helpful for scripting and automation. By standardising how each assembler is invoked, it simplifies the process of generating input assemblies.

Autocycler helper replaces the Bash helper scripts used in Autocycler v0.4.0 and earlier. If you prefer to use those scripts (e.g. because they're easier to modify), you can find them in this earlier commit from the Autocycler repository.

This is the only Autocycler subcommand that relies on external tools, making it somewhat fragile. It may fail to run if the assemblers it depends on are not the versions it was developed for. I will do my best to keep it compatible with current tool versions. As of its initial release (June 2025), it works with the latest versions of all supported tools.

Example commands

Each assembler is run in the same way:

autocycler helper canu --reads subsampled_reads/sample_01.fastq --out_prefix canu_01 --threads 16 --genome_size 5.5m
autocycler helper flye --reads subsampled_reads/sample_01.fastq --out_prefix flye_01 --threads 16 --genome_size 5.5m
autocycler helper raven --reads subsampled_reads/sample_01.fastq --out_prefix raven_01 --threads 16 --genome_size 5.5m

This allows a Bash loop over assemblers:

for assembler in canu flye metamdbg miniasm necat nextdenovo plassembler raven; do
    autocycler helper "$assembler" --reads subsampled_reads/sample_01.fastq --out_prefix "$assembler"_01 --threads 16 --genome_size 5.5m
done

Autocycler helper can also estimate genome size via Raven:

autocycler helper genome_size --reads ont.fastq.gz --threads 16

Full usage

Usage: autocycler helper [OPTIONS] --reads <READS> <TASK>

Arguments:
  <TASK>  Task (required positional) [possible values: genome_size, canu, flye, lja, metamdbg, miniasm,
          myloasm, necat, nextdenovo, plassembler, raven, redbean]

Options:
  -r, --reads <READS>                  Input long reads in FASTQ format (required)
  -o, --out_prefix <OUT_PREFIX>        Output prefix (required for all tasks except genome_size)
  -g, --genome_size <GENOME_SIZE>      Estimated genome size (required for some tasks)
  -t, --threads <THREADS>              Number of CPU threads [default: 8]
  -d, --dir <DIR>                      Working directory [default: use a temporary directory]
      --read_type <READ_TYPE>          Read type [default: ont_r10] [possible values: ont_r9, ont_r10,
                                       pacbio_clr, pacbio_hifi]
      --min_depth_abs <MIN_DEPTH_ABS>  Exclude contigs with read depth less than this absolute value
      --min_depth_rel <MIN_DEPTH_REL>  Exclude contigs with read depth less than this fraction of the
                                       longest contig's depth
      --args <ARGS>...                 Additional arguments for the task
  -h, --help                           Print help
  -V, --version                        Print version

Notes

  • The genome_size task is unique: it produces no output files (therefore does not require --out_prefix) and prints a number to stdout.
  • All assemblers produce a FASTA file named <prefix>.fasta. If available, GFA and log files are saved as <prefix>.gfa and <prefix>.log.
  • The --out_prefix value can include directories, which will be created if they don't exist.
  • By default, a temporary working directory is used and deleted when Autocycler helper finishes. If you provide --dir, it will be used instead and not deleted.
  • Some assemblers require a genome size estimate to be provided via --genome_size (see table below). For assemblers that do not need a genome size estimate, this parameter will be ignored.
  • Similarly, not all assemblers use --read_type (see table below), and it is ignored by those that don't.
  • Assemblers that report per-contig read depth (see table below) support filtering with --min_depth_abs and --min_depth_rel. This is especially useful for metagenome assemblers like metaMDBG and Myloasm.
  • All assemblers work with ONT R10 and PacBio HiFi reads. Some (LJA, Myloasm and metaMDBG) may work poorly or not at all with ONT R9 or PacBio CLR reads due to their lower accuracy.
  • Extra arguments can be passed via --args. E.g. autocycler helper flye --args '-i 2 -m 1000' ... For multi-command workflows, only the first command receives them.
  • For Canu assemblies, additional processing is applied: contigs labelled as 'bubble' or 'repeat' are removed, and circular contigs are trimmed to remove their start-end overlap.
  • Plassembler requires a database which can be in $CONDA_PREFIX/plassembler_db or set with $PLASSEMBLER_DB.
  • Circular Plassembler contigs are randomly reoriented because Autocycler benefits when circular sequences have different starting points, unlike Plassembler which standardises them.
  • stderr is printed to the terminal. stdout is printed too, unless used for file output.
  • Output files are overwritten if they already exist.
  • If an assembler fails or produces an empty assembly, Autocycler helper will delete the empty FASTA file.
Task Assembly method Outputs Requires
‑‑genome_size?
Uses
‑‑read_type?
Per-contig
depths?
genome_size Raven none n/a no n/a
canu Canu FASTA, log yes yes yes
flye Flye FASTA, GFA, log no yes yes
lja LJA FASTA, GFA, log no no no
metamdbg metaMDBG FASTA, log no yes yes
miniasm miniasm+
Minipolish
FASTA, GFA no yes yes
myloasm Myloasm FASTA, GFA, log no yes yes
necat NECAT FASTA yes no no
nextdenovo NextDenovo+
NextPolish
FASTA, log yes yes no
plassembler Plassembler FASTA, GFA, log no yes no
raven Raven FASTA, GFA no no no
redbean wtdbg2+
wtpoa-cns
FASTA yes yes no
⚠️ **GitHub.com Fallback** ⚠️