Input Data Sources - Illumina/Polaris GitHub Wiki

Table of Contents

Summary

Input for variants characterized on Polaris cohorts can be derived from any number of public or private datasets.

In the case of public datasets, Polaris provides additional validation to increase confidence in calls that are either common or found in a characterized pedigree. For private datasets, Polaris additionally provides a means of sharing variants and supporting alignments that might have been difficult or impossible to share otherwise. Additionally, candidates may be provided as input without any prior evidence from an SV dataset, based on, for instance population genetic signals or other, novel means, of variant detection.

Structural and copy number variant inputs

Private SV / CNV inputs

PG-pop

A set deletion and compound deletion / insertion calls that were initially identified as pedigree-consistent in the Platinum Genomes from a variety of SV callers. Reads from these regions were then extracted in a larger population and reassembled using SPAdes1. Any SVs that produced a single unique breakpoint contig within the cohort were taken as input candidates.

Pop-Manta

Population calls identified by Manta2 that were re-identified in a Polaris cohort.

Public SV / CNV inputs

PopIns Icelandic insertions

Insertions described in Kehr, et al 5 identified in a population of Icelandic individuals using PopIns4. Candidates were limited to those that could be converted to a graph input and contained flanking sequence matching the reference.

Parliament insertions

Insertions described in English, et al3 identified using Parliament and incorporating data types and SV calling methods to call SVs in a single subject.

References

  1. Bankevich, et al (2012) SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J Comput Biol. 19(5):455-77. doi:10.1089/cmb.2012.0021
  2. Chen, et al (2016) Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32(8):1220-1222. doi:10.1093/bioinformatics/btv710
  3. English, et al (2015) Assessing structural variation in a personal genome-towards a human reference diploid genome. BMC Genomics. 16:286 doi:10.1186/s12864-015-1479-3
  4. Kehr, et al (2016) PopIns: population-scale detection of novel sequence insertions. Bioinformatics. 32(7):961-7. doi: 10.1093/bioinformatics/btv273
  5. Kehr, et al (2017) Diversity in non-repetitive human sequences not found in the reference genome. Nat Genet. 49(4):588-593. doi: 10.1038/ng.3801
⚠️ **GitHub.com Fallback** ⚠️