Genome size estimation - rrwick/Autocycler GitHub Wiki
When running Autocycler subsample and Generating input assemblies, you'll need a genome size estimate. If you don't already know the size of your genome, here are two reliable methods to determine it:
Methods
-
Assemble your reads and measure the assembly size Raven is a fast assembler well-suited for this purpose. To streamline the process, use the
genome_size_raven.sh
helper script, located in Autocycler's scripts directory. -
Use the LRGE tool LRGE estimates genome size directly from reads and is straightforward to run without requiring a helper script.
Example commands
reads=ont.fastq.gz # your read set goes here
threads=16 # set as appropriate for your system
genome_size=$(genome_size_raven.sh "$reads" "$threads")
# OR
genome_size=$(lrge -t "$threads" "$reads")
Comparison of Methods
Method | Time to run | Error rate |
---|---|---|
Raven | 1-10 minutes | <3% |
LRGE | <1 minute | <10% |
Raven tends to produce more accurate results, while LRGE is faster. Both methods are accurate enough for Autocycler's requirements. For more details on genome size estimation and comparative performance, see the LRGE paper.