Sequel II System Data Release: HG002 SV and SNVs (HiFi Reads powered by CCS) - PacificBiosciences/DevNet GitHub Wiki

SAMPLE

GIAB HG002 extracted DNA

METHODS

  • Shearing 15 kb with Megaruptor
  • Library prep TPK 1.0
  • Size selection Fraction 4 (11kb) with Sage ELF
  • Sequencing Sequel System II with "Early Access" binding kit (101-490-800) and chemistry (101-490-900)
  • Run time 12 hour pre-extension; 30 hour movie
  • CCS SMRT Link v6.1.0 "Early Access" Circular Consensus Sequence Analysis (ccs v3.2.1)
  • SV Calling SMRT Link v6.1.0 "Early Access" Structural Variant Calling (powered by pbsv)
  • Alignment pbmm2 --preset CCS
  • Variant Calling GATK v4.0.10.1 HaplotypeCaller
  • Variant Phasing WhatsHap v0.17
  • Reference hs37d5 (GRCh37 with decoy)

FOLDERS

  • subreads - Basecalled reads and metadata for three Sequel II SMRTCells 8M loaded with 11kb HG002 libraries
  • consensusreads - Circular Consensus reads and metadata for runs above
  • consensusalignments - CCS reads above, aligned to hs37d5 with pbmm2
  • gatk4hc - Small variants called with GATK4 HaplotypeCaller and phased with WhatsHap /GIAB_small_variant_v3.3.2_benchmark Benchmarked against GIAB small variant v3.3.2 with hap.py /GIAB_phasing_benchmark Benchmarked against 10X/Trio phased variant set.
  • pbsv - Structural variants called with SMRT Link Structural Variant Calling (powered by pbsv) /truvari-giab-v0.6 Benchmarked against GIAB structural variant v0.6 with Truvar

DOWNLOAD

URLs and md5 checksums are listed in URLs.txt

Note: subreads directory is 1.2 TB and contains the basecalled Sequel II data (unaligned BAM file format)

Download Data: https://downloads.pacbcloud.com/public/dataset/HG002_SV_and_SNV_CCS/

Download for China: https://downloads-ap.pacbcloud.com/public/dataset/HG002_SV_and_SNV_CCS/