Sequencing Depth - uic-ric/ Wiki

These are general recommendations for sequencing depth for different types of sequencing experiments. Unless otherwise noted, recommendations below are for typical mammalian systems, with ~3 Gb genomes and ~20,000 protein coding genes. For other types of organisms you will need to scale estimates based on the genome size and/or number of genes. Please contact the Research Informatics Core (RIC) at [email protected] with any questions.

Omics type Experiment Recommend depth (number of clusters) Paired End (PE) or Single End (SE) Minimum recommended sequencing length (bp) Notes
Transcriptomics RNA-seq (gene expression) 20-30M SE 50 PE data not necessary if only gene-level expression is needed.
RNA-seq (isoform expression) 40-60M PE 75-100 PE data is essential. Longer reads can be helpful for finding splice junctions.
miRNA-seq 5-10M SE 50 No benefit to longer or PE reads for short RNAs.
Epigenomics ATAC-seq 40-60M PE 50 PE data provides benefit in identifying peaks and resolving PCR duplicates.
ChIP-seq (narrow marks) 40-60M PE 50 PE data provides benefit in identifying peaks and resolving PCR duplicates. Should be paired with an input sample sequenced the same way, to at least the same depth. You may need to aim slightly higher or lower in depth depending on the prevalence of the mark across the genome (e.g., transcription factors = less prevalent, histone marks = more prevalent), as well as the efficiency of the antibody.
ChIP-seq (broad marks) 70-100M PE 50 Higher depth is important for broader marks with less percentage enrichment over input. This recommendation would apply to methylation pulldown experiments as well (MEDIP-seq or MBD-seq).
Bisulfite (RRBS) 20-40M PE or SE with UMI 150 Some libraries options with have UMIs to identify PCR duplicates, in which case SE data is sufficient.
Bisulfite (whole genome) 300-500M PE or SE with UMI 150 Longer and paired-end reads are needed for accurate alignments in lower-complexity bisulfite-converted genomes. Aiming for ~30-50x coverage. Anticipate that ~25-40% of reads will not be mappable to the reference.
Genomics Variant calling (whole genome) 300M PE 150 Aiming for ~30x coverage. This recommendation is for germline variant calling; for somatic variant calling, aim for ~50-100x coverage.
Variant calling (germline, exome) 10-15M PE 150 Typical target size is ~40Mb for human/mouse, aiming for ~50-100x coverage. For somatic variant calling, aim for 100-150x coverage.
Variant calling (prokaryotic/small genome) ~2M PE 150 Recommended depth is based on ~100x coverage for a 5MB genome. Scale up or down as needed for bigger or smaller genomes.
De novo genome assembly (prokaryotic/small genome) Illumina + Long read PE 150 We recommended ~100x coverage from both Illumina and long-read sequencing (PacBio or Nanopore).
Metagenomics Shotgun metagenomics 1-20M PE 150, 250 preferred For short-read annotation approaches only, overlapping paired-end reads is recommended. For combined short-read annotation and de novo assembly, then larger inserts are recommended. Even longer reads are becoming possible – PacBio, Oxford Nanopore, synthetic long-read technologies.
Amplicon metagenomics 10-100k PE 150-300, depending on amplicon length There should be a minimum of 20bp overlap between PE reads for merging of forward and reverse reads. Some amplicons are too long to be fully sequenced by Illumina sequencers. Long read sequencing can be achieved with PacBio, Oxford Nanopore and synthetic long-read sequencing (Loop Genomics).
⚠️ ** Fallback** ⚠️