Genome size estimation - rrwick/Autocycler GitHub Wiki
When running Autocycler subsample and Generating input assemblies, you'll need a genome size estimate. If you don't already know the size of your genome, here are two reliable methods to determine it:
Methods
-
Assemble your reads and measure the assembly size Raven is a fast assembler well-suited for this purpose. To streamline the process, use the Autocycler helper command.
-
Use the LRGE tool LRGE estimates genome size directly from reads and is straightforward to run on its own.
Example commands
reads=ont.fastq.gz # your read set goes here
threads=16 # set as appropriate for your system
genome_size=$(autocycler helper genome_size --reads "$reads" --threads "$threads")
# OR
genome_size=$(lrge -t "$threads" "$reads")
Comparison of Methods
| Method | Time to run | Error rate |
|---|---|---|
| Raven | 1-10 minutes | <3% |
| LRGE | <1 minute | <10% |
Raven tends to produce more accurate results, while LRGE is faster. Both methods are accurate enough for Autocycler's requirements.
For more on genome size estimation (including a detailed comparison of Raven and LRGE), check out this paper: Hall MB, Zhou C, Coin LJM (2025). Genome size estimation from long read overlaps. Bioinformatics. doi:10.1093/bioinformatics/btaf593.