Genome size estimation - rrwick/Autocycler GitHub Wiki

When running Autocycler subsample and Generating input assemblies, you'll need a genome size estimate. If you don't already know the size of your genome, here are two reliable methods to determine it:

Methods

  1. Assemble your reads and measure the assembly size Raven is a fast assembler well-suited for this purpose. To streamline the process, use the Autocycler helper command.

  2. Use the LRGE tool LRGE estimates genome size directly from reads and is straightforward to run on its own.

Example commands

reads=ont.fastq.gz  # your read set goes here
threads=16  # set as appropriate for your system

genome_size=$(autocycler helper genome_size --reads "$reads" --threads "$threads")
# OR
genome_size=$(lrge -t "$threads" "$reads")

Comparison of Methods

Method Time to run Error rate
Raven 1-10 minutes <3%
LRGE <1 minute <10%

Raven tends to produce more accurate results, while LRGE is faster. Both methods are accurate enough for Autocycler's requirements. For more details on genome size estimation and comparative performance, see the LRGE paper.