Sequencing Depth - uic-ric/uic-ric.github.io GitHub Wiki
These are general recommendations for sequencing depth for different types of sequencing experiments. Unless otherwise noted, recommendations below are for typical mammalian systems, with ~3 Gb genomes and ~20,000 protein coding genes. For other types of organisms you will need to scale estimates based on the genome size and/or number of genes. Please contact the Research Informatics Core (RIC) at [email protected] with any questions.
Omics type | Experiment | Recommend depth (number of clusters) | Paired End (PE) or Single End (SE) | Minimum recommended sequencing length (bp) | Notes |
---|---|---|---|---|---|
Transcriptomics | RNA-seq (gene expression) | 20-30M | SE | 50 | PE data not necessary if only gene-level expression is needed. |
RNA-seq (isoform expression) | 40-60M | PE | 75-100 | PE data is essential. Longer reads can be helpful for finding splice junctions. | |
miRNA-seq | 5-10M | SE | 50 | No benefit to longer or PE reads for short RNAs. | |
Epigenomics | ATAC-seq | 40-60M | PE | 50 | PE data provides benefit in identifying peaks and resolving PCR duplicates. |
ChIP-seq (narrow marks) | 40-60M | PE | 50 | PE data provides benefit in identifying peaks and resolving PCR duplicates. Should be paired with an input sample sequenced the same way, to at least the same depth. You may need to aim slightly higher or lower in depth depending on the prevalence of the mark across the genome (e.g., transcription factors = less prevalent, histone marks = more prevalent), as well as the efficiency of the antibody. | |
ChIP-seq (broad marks) | 70-100M | PE | 50 | Higher depth is important for broader marks with less percentage enrichment over input. This recommendation would apply to methylation pulldown experiments as well (MEDIP-seq or MBD-seq). | |
Bisulfite (RRBS) | 20-40M | PE or SE with UMI | 150 | Some libraries options with have UMIs to identify PCR duplicates, in which case SE data is sufficient. | |
Bisulfite (whole genome) | 300-500M | PE or SE with UMI | 150 | Longer and paired-end reads are needed for accurate alignments in lower-complexity bisulfite-converted genomes. Aiming for ~30-50x coverage. Anticipate that ~25-40% of reads will not be mappable to the reference. | |
Genomics | Variant calling (whole genome) | 300M | PE | 150 | Aiming for ~30x coverage. This recommendation is for germline variant calling; for somatic variant calling, aim for ~50-100x coverage. |
Variant calling (germline, exome) | 10-15M | PE | 150 | Typical target size is ~40Mb for human/mouse, aiming for ~50-100x coverage. For somatic variant calling, aim for 100-150x coverage. | |
Variant calling (prokaryotic/small genome) | ~2M | PE | 150 | Recommended depth is based on ~100x coverage for a 5MB genome. Scale up or down as needed for bigger or smaller genomes. | |
De novo genome assembly (prokaryotic/small genome) | Illumina + Long read | PE | 150 | We recommended ~100x coverage from both Illumina and long-read sequencing (PacBio or Nanopore). | |
Metagenomics | Shotgun metagenomics | 10-20M | PE | 150, 250 preferred | For short-read annotation approaches only, overlapping paired-end reads is recommended. For combined short-read annotation and de novo assembly, then larger inserts are recommended. Even longer reads are becoming possible – PacBio, Oxford Nanopore, synthetic long-read technologies. |
Amplicon metagenomics | 10-50k | PE | 150-300, depending on amplicon length | There should be a minimum of 20bp overlap between PE reads for merging of forward and reverse reads. Some amplicons are too long to be fully sequenced by Illumina sequencers. Long read sequencing can be achieved with PacBio, Oxford Nanopore and synthetic long-read sequencing (Loop Genomics). |