SMRT Analysis Software Installation v1.4.0 - pb-dyim/SMRT-Analysis GitHub Wiki

Introduction

  • This document describes the basic requirements for installing SMRT Analysis® v1.4.0 on a customer system.
  • This document is for use by Field Service and Support personnel, as well as Customer IT.

System Requirements

Operating System##

  • SMRT Analysis is only supported on:
    • English-language Ubuntu 8.04
    • English-language Ubuntu 10.04
    • English-language RedHat/CentOS 5.3
    • English-language RedHat/CentOS 5.6
  • SMRT Analysis cannot be installed on the Mac OS or Windows.
  • Users with alternate versions of Ubuntu or CentOS will likely encounter library errors when running an initial analysis job. The errors in the smrtpipe.log file indicate which libraries are needed. Install any missing libraries on your system for an analysis job to complete successfully.

Running SMRT Analysis in the Cloud

Users who do not have access to a server with CentOS 5.6 or later or Ubuntu 10.0.4 or later can use the public Amazon Machine Image (AMI). For details, see the document Running SMRT Analysis on Amazon, available from the PacBio® Developer’s Network at http://www.pacbiodevnet.com.

Software Requirement

  • MySQL 5
  • bash
  • Perl (v5.8.8)

Ubuntu:###

  • aptitude install mysql-server libxml-parser-perl liblapack3gf libssl0.9.8

CentOS 5:###

  • yum install mysql-server perl-XML-Parser libgfortran libgfortran44 openssl redhat-lsb

CentOS 6:###

  • yum install mysql-server perl-XML-Parser compat-libgfortran-41 openssl098e redhat-lsb

Client web browser:

We recommend using Firefox® 15 or Google Chrome® 21 web browsers to run SMRT Portal for consistent functionality. We also support Apple’s Safari® and Internet Explorer® web browsers; however some features may not be optimized on these browsers.

Client Java:

To run SMRT View, we recommend using Java 7 for Windows (Java 7 64 bit for users with 64 bit OS), and Java 6 for the Mac OS.

Minimum Hardware Requirements##

1 head node:###

  • Minimum 16 GB RAM. Larger references such as human may require 32 GB RAM.
  • Minimum 250 GB of disk space

3 compute nodes:###

  • 8 cores per node, with 2 GB RAM per core
  • Minimum 250 GB of disk space per node
  • To perform de novo assembly of large genomes using the Celera® Assembler, one of the nodes will need to have considerably more memory. See the Celera Assembler home page for recommendations: http://wgs-assembler.sourceforge.net/.

Data storage:###

  • 10 TB (Actual storage depends on usage.)

Network File System Requirement###

  • NFS mounts to the input locations (metadata.xml, bas.h5 files, and so on).
  • NFS mounts to the output locations ($SEYMOUR_HOME/common/userdata).
  • $SEYMOUR_HOME should be viewable by all compute nodes.
  • Compute nodes must be able to write back to the job directory.

Installation and Upgrade Summary

Following are the steps for installing SMRT Analysis v1.4.0. For further details, click the links.

  1. Select an installation directory to assign to the $SEYMOUR_HOME environmental variable. In this summary, we use /opt/smrtanalysis.

  2. Decide on a sudo user who will perform the installation. In this summary, we use <thisuser>, who belongs to <thisgroup>.

  3. Extract the tarball and softlink the directories:

tar -C /opt -xvvzf <tarball_name>.tgz
rm /opt/smrtanalysis (if it already exists)
ln -s /opt/smrtanalysis-1.4.0 /opt/smrtanalysis
sudo chown -R <thisuser>:<thisgroup> smrtanalysis-1.4.0
  1. Edit the setup script /opt/smrtanalysis-1.4.0/etc/setup.sh to match your installation location:
SEYMOUR_HOME=/opt/smrtanalysis
  1. Run the appropriate script:
  /opt/smrtanalysis/etc/scripts/postinstall/configure_smrtanalysis.sh
  • Option 2: If you are upgrading and want to preserve SMRT Cells, jobs, and users from a previous installation: Turn off services and run the [upgrade script] (#Step5Upgrade).
  /opt/smrtanalysis-<old-version-number>/etc/scripts/tomcatd/ stop
  /opt/smrtanalysis-<old-version-number>/etc/scripts/kodosd/ stop
  /opt/smrtanalysis/etc/scripts/postinstall/upgrade_and_configure_smrtanalysis.sh
  1. Set up distributed computing by deciding on a job management system (JMS), then edit the following files:
/opt/smrtanalysis/analysis/etc/smrtpipe.rc
/opt/smrtanalysis/analysis/etc/cluster/<JMS>/start.tmpl
/opt/smrtanalysis/analysis/etc/cluster/<JMS>/interactive.tmpl
/opt/smrtanalysis/analysis/etc/cluster/<JMS>/kill.tmpl
/opt/smrtanalysis/redist/tomcat/webapps/smrtportal/WEB-INF/web.xml

Note: If you are not using SGE, you will need to deactivate the Celera Assembler protocols so that they do not display in SMRT Portal. To do so, rename the following files, located in common/protocols. Rename the following files:

RS_CeleraAssembler.1.xml to RS_CeleraAssembler.1.bak
filtering/CeleraAssemblerSFilter.1.xml to CeleraAssemblerSFilter.1.bak
assembly/CeleraAssembler.1.xml to CeleraAssembler.1.bak
  1. New Installations only: Set up user data folders that point to external storage.

  2. New Installations only: [Set up SMRT Portal] (#Step8).

  3. Start the SMRT Portal and Automatic Secondary Analysis Services.

  4. [Verify] (#Step10) the installation.

Bundled with SMRT® Analysis

The following are bundled within the application and should not depend on what is already deployed on the system.

  • Java® 1.6
  • Python® 2.5.2
  • Tomcat™ 7.0.23

Changes from SMRT® Analysis v1.3.3

See SMRT Analysis Release Notes (v1.4.0) for changes and known issues. The latest version of the document resides on the Pacific Biosciences DevNet site; you can link to it from the main SMRT Analysis web page.

Step 3: Extract the Tarball

Extract the tarball to its final destination - this creates a smrtanalysis-1.4.0/ directory. Be sure to use the tarball appropriate to your system - Ubuntu or CentOS.

Note: You need to run these commands as sudo if you do not have permission to write to the install folder. If the extracted folder is not owned by the user performing the installation (/opt is typically owned by root), change the ownership of the folder and all its contents.

Example: To change permissions within /opt:

sudo chown -R <thisuser>:<thisgroup> smrtanalysis-1.4.0

We recommend deploying to /opt:

tar -C /opt -xvvzf <tarball_name>.tgz

We also recommend creating a symbolic link to /opt/smrtanalysis-1.4.0 with /opt/smrtanalysis:

ln -s /opt/smrtanalysis-1.4.0 /opt/smrtanalysis

This enables subsequent upgrades to be transparent with a change in the symbolic link to the upgraded tarball directory.

Step 5: Run the Installation Script

Run the installation script:

cd $SEYMOUR_HOME/etc/scripts/postinstall
./configure_smrtanalysis.sh

The installation script requires the following input:

  • The system name. (Default: hostname -a)
  • The port number that the services will run under. (Default: 8080)
  • The Tomcat shutdown port. (Default: 8005)
  • The user/group to run the services and set permissions for the files. (Default: smrtanalysis:smrtanalysis)
  • The mysql user name and password to install the database. (Default: root:no password)

The installation script performs the following:

  • Creates the SMRT Portal database. Note: The mysql user performing the install must have permissions to alter or create databases. Otherwise, the installer will reject the user and prompt for another.
  • Sets the host and port names for various configuration files.
  • Sets the Tomcat/kodos user. The services will run as the specified user.
  • Sets the user and group permissions and ownership of the application to the Tomcat user.
  • Adds links in /etc/init.d to the Tomcat and kodos services. (The defaults are: /etc/init.d/kodosd and /etc/init.d/tomcatd.) These are soft links to the actual service files within the application. If a file is already present (for example, tomcatd is already installed), the link can be created with a different name. The permissions of the underlying scripts are limited to the user running the services.
  • Installs the services. The services will automatically restart if the system restarts. (On CentOS, the installer will run chkconfig to install the services, rather than update-rc.d.)

Note: The installer will attempt to run without sudo access first. If this fails, the installer will prompt the user for a sudo password and retry.

Step 5, Option 2: Run the Upgrade Script

If you are upgrading from v1.3.3 to v1.4.0 and want to preserve SMRT Cells, jobs, and users from a previous installation:

Run upgrade_and_configure_smrtanalysis.sh to update the database schema and the reference repository entries:

cd $SEYMOUR_HOME/etc/scripts/postinstall
./upgrade_and_configure_smrtanalysis.sh

Skip setting up the services: (These should already exist from the previous installation.)

Now creating symbolic links in /etc/init.d. Continue? [Y/n] n

Step 6: Set up Distributed Computing

SMRT Analysis provides support for distributed computation using an existing job management system. Pacific Biosciences has explicitly validated Sun Grid Engine (SGE), LSF and PBS.

Note: Celera Assembler 7.0 will only work correctly with the SGE job management system. If you are not using SGE, you will need to deactivate the Celera Assembler protocols so that they do not display in SMRT Portal. To do so, rename the following files, located in common/protocols:

RS_CeleraAssembler.1.xml to RS_CeleraAssembler.1.bak
filtering/CeleraAssemblerSFilter.1.xml to CeleraAssemblerSFilter.1.bak
assembly/CeleraAssembler.1.xml to CeleraAssembler.1.bak

This section describes setup for SGE and gives guidance for extensions to other Job Management Systems.

Smrtpipe.rc Configuration

Following are the options in the $SEYMOUR_HOME/analysis/etc/smrtpipe.rc file that you can set to execute distributed SMRT Pipe runs.

Link to the SMRT Pipe section when ready

Configuring Templates

The central component for setting up distributed computing in SMRT Analysis are the Job Management Templates (JMTs). JMTs provide a flexible format for specifying how SMRT Analysis communicates with the resident JMS. There are two templates which must be modified for your system:

  • start.tmpl is the legacy template used for assembly algorithms.
  • interactive.tmpl is the new template used for resequencing algorithms. The difference between the two is the additional requirement of a sync option in interactive.tmpl. (kill.tmpl is not used.)

Note: We are in the process of converting all protocols to use only interactive.tmpl.

To customize a JMS for a particular environment, edit or create start.tmpl and interactive.tmpl. For example, the installation includes the following sample start.tmpl and interactive.tmpl (respectively) for SGE:

qsub -pe smp ${NPROC} -S /bin/bash -V -q secondary -N ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} ${EXTRAS} ${CMD}
qsub -S /bin/bash -sync y -V -q secondary -N ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} -pe smp ${NPROC} ${CMD}

To support a new JMS:

  1. Create a new directory in etc/cluster/ under NEW_NAME.
  2. In smrtpipe.rc, change the CLUSTER_MANAGER variable to NEW_NAME, as described in “Smrtpipe.rc Configuration”.
  3. Once you have a new JMS directory specified, edit the interactive.tmpl and start.tmpl files for your particular setup.

Sample SGE, LSF and PBS templates are included with the installation in $SEYMOUR_HOME/analysis/etc/cluster.

Specifying the SGE Job Management System:

For this version (v1.4.0), you must still edit both interactive.tmpl and start.tmpl as follows:

  1. Change secondary to the queue name on your system. (This is the –q option.)
  2. Change smp to the parallel environment on your system. (This is the -pe option.)

Specifying the PBS Job Management System

PBS does not have a –sync option, so the interactive.tmpl file runs a script named qsw.py to simulate the functionality. You must edit both interactive.tmpl and start.tmpl.

  1. Change the queue name to one that exists on your system. (This is the –q option.)
  2. Change the parallel environment to one that exists on your system. (This is the -pe option.)
  3. Make sure that interactive.tmpl calls the –PBS option.

Specifying the LSF Job Management System

Create an interactive.tmpl file by copying the start.tmpl file and adding the –K functionality in the bsub call. Or, you can also edit the sample LSF templates.

Specifying other Job Management Systems

We have not tested the –sync functionally on other systems. Find the equivalent to the –sync option for your JMS and create an interactive.tmpl file. If there is no -sync option available, you may need to edit the qsw.py script in $SEYMOUR_HOME/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/EGG-INFO/scripts/qsw.py to add additional options for wrapping jobs on your system.

The code for PBS and SGE looks like the following:

if '-PBS' in args:
            args.remove('-PBS')
            self.jobIdDecoder   = PBS_JOB_ID_DECODER
            self.noJobFoundCode = PBS_NO_JOB_FOUND_CODE
            self.successCode    = PBS_SUCCESS_CODE
            self.qstatCmd       = "qstat"
        else:
            self.jobIdDecoder   = SGE_JOB_ID_DECODER
            self.noJobFoundCode = SGE_NO_JOB_FOUND_CODE
            self.successCode    = SGE_SUCCESS_CODE
            self.qstatCmd       = "qstat -j"

Configuring SMRT Portal

Running jobs in distributed mode is disabled by default in SMRT Portal. To enable distributed processing, set the jobsAreDistributed value in $SEYMOUR_HOME/redist/tomcat/webapps/smrtportal/WEB-INF/web.xml to true:

<context-param>
<param-name>jobsAreDistributed</param-name>
<param-value>true</param-value>
</context-param>

You will need to restart Tomcat.

The upgrade process will port over the configuration settings from the previous version.

Step 7: (New Installations Only) Set Up User Data Folders

SMRT Analysis saves references and results in its own hierarchy. Note that large amounts of data are generated and storage can get filled up. We suggest that you softlink to an external directory with more storage.

All jobs and references, as well as drop boxes, are contained in $SEYMOUR_HOME/common/userdata. You can move this folder to another location, then soft link $SEYMOUR_HOME/common/userdata to the new location.

If performing a fresh installation: For example

mv $SEYMOUR_HOME/common/userdata /my_offline_storage
ln -s /my_offline_storage/userdata $SEYMOUR_HOME/common/userdata

If upgrading, you need to point the new build to the external storage location. For example:

rm $SEYMOUR_HOME/common/userdata
ln -s /my_offline_storage/userdata $SEYMOUR_HOME/common/userdata

Note: The default protocols and underlying support files within common/protocols and subfolders were updated significantly for v1.4.0. We strongly recommend that you recreate protocols for v1.4.0 rather than carry over protocols from previous versions.

Step 8: (New Installations Only) Set Up SMRT® Portal

  1. Use your web browser to start SMRT Portal: http://HOST:PORT/smrtportal
  2. Click Register at the top right.
  3. Create a user named administrator (all lowercase). This user is special, as it is the only user that does not require activation on creation.
  4. Enter the user name administrator.
  5. Enter an email address. All administrative emails, such as new user registrations, will be sent to this address.
  6. Enter the password and confirm the password.
  7. Select Click Here to access Change Settings.
  8. To set up the mail server, enter the SMTP server information and click Apply. For email authentication, enter a user name and password. You can also enable Transport Layer Security.
  9. To enable automated submission from a PacBio® RS instrument, click Add under the Instrument Web Services URI field. Then, enter the following into the dialog box and click OK:
http://INSTRUMENT_PAP01:8081

INSTRUMENT_PAP01 is the IP address or name (pap01) of the instrument. 8081 is the port for the instrument web service.

  1. Select the new URI, then click Test to check if SMRT Portal can communicate with the instrument service.
  2. (Optional) You can delete the pre-existing instrument entry by clicking Remove.

Step 9: Start the SMRT® Portal and Automatic Secondary Analysis Services

  1. Start Tomcat: sudo /$SEYMOUR_HOME/etc/scripts/tomcatd start
  2. Start kodos: sudo /etc/init.d/kodosd start

Step 10: Verify the installation

Create a test job in SMRT Portal using canned installation data:

Open your web browser and clear the browser cache:

  • Google Chrome: Choose Tools > Clear browsing data. Choose the beginning of time from the droplist, then check Empty the cache and click Clear browsing data.
  • Internet Explorer: Choose Tools > Internet Options > General, then under Browsing history, click Delete. Check Temporary Internet files, then click Delete.
  • Firefox: Choose Tools > Options > Advanced, then click the Network tab. In the Cached Web Content section, click Clear Now.
  1. Refresh the current page by pressing F5.
  2. Log into SMRT Portal by navigating to http://HOST:PORT/smrtportal.
  3. Click Design Job.
  4. Click Import and Manage.
  5. Click Import SMRT Cells.
  6. Click Add.
  7. Enter /opt/smrtanalysis/common/test/primary, then click OK.
  8. Select the new path and click Scan. You should get a dialog saying “One input was scanned." Note: If you are upgrading to v1.4.0, this cell will already have been imported into your system. In addition, the input was downsampled to speed the test and reduce the overall tarball size.
  9. Click Design Job.
  10. Click Create New.
  11. Enter a job name and comment.
  12. Select the protocol RS_Resequencing.1.
  13. Under SMRT Cells Available, select a lambda cell and click the right-arrow button.
  14. Click Save on the bottom right, then click Start. The job should complete successfully.
  15. Click the SMRT View button. SMRT View should open with tracks displayed, and the reads displayed in the Details panel.

For Research Use Only. Not for use in diagnostic procedures. © Copyright 2010 - 2013, Pacific Biosciences of California, Inc. All rights reserved. Information in this document is subject to change without notice. Pacific Biosciences assumes no responsibility for any errors or omissions in this document. Certain notices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences products and/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions of Sale and the applicable license terms at http://www.pacificbiosciences.com/licenses.html.

Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT and SMRTbell are trademarks of Pacific Biosciences in the United States and/or certain other countries. All other trademarks are the sole property of their respective owners.

⚠️ **GitHub.com Fallback** ⚠️