SMRT Analysis Release Notes v2.2.0 - dyim42/SMRT-Analysis GitHub Wiki

  • [Introduction] (#Intro)
  • [Installation] (#Install)
  • [New Features in v2.2.0] (#New)
  • [Enhanced Protocols in v2.2.0] (#ENH_Protocols)
  • [Fixed Issues in v2.2.0] (#Fixed)
  • [Known Issues in v2.2.0] (#Known)

Introduction

The SMRT Analysis software suite performs assembly and variant detection analysis of sequencing data generated by the Pacific Biosciences instrument.

Installation

For installation instructions, see SMRT Analysis Software Installation.

New Features in v2.2.0

###SMRT Analysis###

  • Iso-Seq™ module adds full-length transcript Q/C and clustering steps, in addition to mapping to a reference genome using the GMAP tool.

  • Long-Amplicon Analysis module now includes enhanced chimera filtering, providing greater confidence in genotyping results.

  • Minor-Variant Analysis module now uses a more sophisticated model tuned to PacBio reads; the same as used with Quiver.

  • HGAP 3 (PacBio genome assembly tool) now incorporates a potential 10-fold speed improvement (wall-clock time) for microbial assembly. The increased speed can dramatically reduce the time required to completely assemble a full microbial genome.

    • HGAP 2 is now our production assembly version and HGAP 3 is the beta version.
    • HGAP 1 is no longer supported. We encourage you to migrate to SMRT Analysis v2.2.0.
  • Improved SMRT Pipe module interface documentation and examples.

###SMRT Portal###

  • Protocol Selector groups protocols by application and simplifies navigation for new users. (The feature can be turned off by users after the first use.)
  • N50 statistics are now included in many of the generated reports; resequencing reports now display "Concordance" instead of "Accuracy".
  • Tooltip descriptions display when editing protocol parameters.
  • Can now create output FASTA/FASTQ files without control reads.

###Installation/Upgrade###

  • One tarball supplied for all supported operating systems.
  • Now includes the mySQL Server bundled with the tarball - no external MySQL® server needed.
  • Now includes Celera® Assembler 8.1 bundled with the tarball.
  • Now includes the phmmer prebuilt binary bundled with the tarball.

Enhanced Protocols in v2.2.0

  • RS_IsoSeq (BETA): Classifies PacBio reads into full-length (FL) or non-full-length (non-FL) transcript reads, with optional clustering and mapping steps. Replaces the RS_cDNA_Mapping protocol.

  • RS_HGAP_Assembly.3 (BETA): Optimized for speed: 10-fold improvement for small- and midsize genome assembly, providing shorter turnaround time.

  • RS_Minor_Variant (BETA): Calls minor variants in a heterogeneous dataset against a user-provided reference sequence, with frequencies down to 0.5%. Replaces the RS_Minor_and_Compound_Variants protocol.

  • RS_Long_Amplicon_Analysis: Includes enhanced chimera filtering.

  • RS_HGAP_Assembly.2, RS_HGAP_Assembly.3: Added support for filtering control reads out of the filtered FASTA/FASTQ files generated by the protocols.

###Obsolete Protocols:###

  • RS_HGAP_Assembly.1: Use RS_HGAP_Assembly.2, which is now our production assembly software.
  • RS_cDNA_Mapping: Use RS_IsoSeq instead.
  • RS_Minor_and_Compound_Variants: Use RS_Minor_Variant instead.
  • RS_Resequencing_GATK_Barcode:
    • Use RS_Subreads if your desired output is FASTQ files containing reads split up by barcodes.
    • Use RS_Resequencing_Barcode if your desired output is cmp.h5 files containing reads split up by barcodes.

Note: GATK and associated executables are no longer included.

###Protocols Whose Names Changed:###

  • RS_Filter_Only: Use RS_Subreads instead.
  • RS_Resequencing_ReadsOfInsert: Use RS_ReadsOfInsert_Mapping instead.
  • BridgeMapper_Beta: Use RS_BridgeMapper instead.

Fixed Issues in v2.2.0

###SMRT Portal##

  • Users logged in as scientist can now archive and restore their own jobs. (24442)
  • Consolidated and reorganized the protocols. (24622)
  • Now calls a script to backup the database. (24644)
  • Added the build number to the About dialog. (24303)
  • Changed to the "Reads of Insert" labeling in the Protocols Details dialog. (24675)

###SMRT Pipe###

  • Various improvements to reduce memory usage in the resequencing pipeline. (24455)
  • AHA algorithm now works with reference sequence headers that contain space characters. (24407)
  • User environment is now cached before installation, then reinstalled after SMRT Analysis is installed. (24668, 24881)

###SMRT Pipe - Assembly##

  • Greater efficiency in the use of cluster resources. (24540)
  • Added a Contig Depth vs Quality Report. (24429)

###SMRT Pipe - Barcoding##

  • Fixed two issues in pbbarcode scoring that caused mislabeling in paired mode. (24426)

###SMRT Pipe - Consensus###

  • Improved handling of experiments containing different chemistries, and improved robustness and speed for the P5-C3 chemistry. (23819)
  • Quiver now gives high confidence to variants in extremely low coverage regions. (24541)
  • The RS_Resequencing_ReadsOfInsert protocol does not include variant calling. (24374)

###SMRT Pipe - Base Modifications###

  • Made many general scaling improvements; now works with medium-scale genomes, such as Arabadopsis. (24059)

###SMRT Pipe - Long-Amplicon Analysis###

  • Improved chimera detection. (24565)

###SMRT Pipe - Mapping###

  • Replaced compareSequences.py with pbalign.py. (24093)

Known Issues in v2.2.0

###SMRT Analysis###

  • SMRT Analysis has not been tested with all Job Management systems, and may not work correctly with glusterFS. (22571, 23220)
  • SMRT Analysis was designed to work with known supported workflows; that is, the included RS_ protocols. Memory constraints should be considered for non-supported command-line workflows. (24090)
  • ReferenceUploader fails if there are illegal > characters in the FASTA file header. (24236)
  • Robust validation of input bax.h5 and bas.h5 files is required when creating a job. (23246)
  • Robust path validation in python is required to avoid NFS issues. (24549)
  • SAM files output by BLASR do not conform to the SAM format. (23264)
  • pbalign.py incorrectly treats a multipart bas.h5 file as a CCS file, and aborts when it can't find data it needs. (24173)
  • Motif Finder software is not yet optimized for high-GC genome base modifications. (24315)

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2010 - 2014, Pacific Biosciences of California, Inc. All rights reserved. Information in this document is subject to change without notice. Pacific Biosciences assumes no responsibility for any errors or omissions in this document. Certain notices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences products and/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions of Sale and to the applicable license terms at http://www.pacificbiosciences.com/licenses.html. P/N 100-321-300