SMRT Analysis Software Installation v1.4.0 - pb-dyim/SMRT-Analysis GitHub Wiki
- This document describes the basic requirements for installing SMRT Analysis® v1.4.0 on a customer system.
- This document is for use by Field Service and Support personnel, as well as Customer IT.
- SMRT Analysis is only supported on:
- English-language Ubuntu 8.04
- English-language Ubuntu 10.04
- English-language RedHat/CentOS 5.3
- English-language RedHat/CentOS 5.6
- SMRT Analysis cannot be installed on the Mac OS or Windows.
- Users with alternate versions of Ubuntu or CentOS will likely encounter library errors when running an initial analysis job. The errors in the
smrtpipe.log
file indicate which libraries are needed. Install any missing libraries on your system for an analysis job to complete successfully.
Users who do not have access to a server with CentOS 5.6 or later or Ubuntu 10.0.4 or later can use the public Amazon Machine Image (AMI). For details, see the document Running SMRT Analysis on Amazon, available from the PacBio® Developer’s Network at http://www.pacbiodevnet.com.
- MySQL 5
- bash
- Perl (v5.8.8)
aptitude install mysql-server libxml-parser-perl liblapack3gf libssl0.9.8
yum install mysql-server perl-XML-Parser libgfortran libgfortran44 openssl redhat-lsb
yum install mysql-server perl-XML-Parser compat-libgfortran-41 openssl098e redhat-lsb
We recommend using Firefox® 15 or Google Chrome® 21 web browsers to run SMRT Portal for consistent functionality. We also support Apple’s Safari® and Internet Explorer® web browsers; however some features may not be optimized on these browsers.
To run SMRT View, we recommend using Java 7 for Windows (Java 7 64 bit for users with 64 bit OS), and Java 6 for the Mac OS.
- Minimum 16 GB RAM. Larger references such as human may require 32 GB RAM.
- Minimum 250 GB of disk space
- 8 cores per node, with 2 GB RAM per core
- Minimum 250 GB of disk space per node
- To perform de novo assembly of large genomes using the Celera® Assembler, one of the nodes will need to have considerably more memory. See the Celera Assembler home page for recommendations: http://wgs-assembler.sourceforge.net/.
- 10 TB (Actual storage depends on usage.)
- NFS mounts to the input locations (metadata.xml, bas.h5 files, and so on).
- NFS mounts to the output locations
($SEYMOUR_HOME/common/userdata)
. -
$SEYMOUR_HOME
should be viewable by all compute nodes. - Compute nodes must be able to write back to the job directory.
Following are the steps for installing SMRT Analysis v1.4.0. For further details, click the links.
-
Select an installation directory to assign to the
$SEYMOUR_HOME
environmental variable. In this summary, we use/opt/smrtanalysis
. -
Decide on a sudo user who will perform the installation. In this summary, we use
<thisuser>
, who belongs to<thisgroup>
. -
Extract the tarball and softlink the directories:
tar -C /opt -xvvzf <tarball_name>.tgz
rm /opt/smrtanalysis (if it already exists)
ln -s /opt/smrtanalysis-1.4.0 /opt/smrtanalysis
sudo chown -R <thisuser>:<thisgroup> smrtanalysis-1.4.0
- Edit the setup script
/opt/smrtanalysis-1.4.0/etc/setup.sh
to match your installation location:
SEYMOUR_HOME=/opt/smrtanalysis
- Run the appropriate script:
- Option 1: If you are performing a fresh installation, run the installation script:
/opt/smrtanalysis/etc/scripts/postinstall/configure_smrtanalysis.sh
- Option 2: If you are upgrading and want to preserve SMRT Cells, jobs, and users from a previous installation: Turn off services and run the [upgrade script] (#Step5Upgrade).
/opt/smrtanalysis-<old-version-number>/etc/scripts/tomcatd/ stop
/opt/smrtanalysis-<old-version-number>/etc/scripts/kodosd/ stop
/opt/smrtanalysis/etc/scripts/postinstall/upgrade_and_configure_smrtanalysis.sh
- Set up distributed computing by deciding on a job management system (JMS), then edit the following files:
/opt/smrtanalysis/analysis/etc/smrtpipe.rc
/opt/smrtanalysis/analysis/etc/cluster/<JMS>/start.tmpl
/opt/smrtanalysis/analysis/etc/cluster/<JMS>/interactive.tmpl
/opt/smrtanalysis/analysis/etc/cluster/<JMS>/kill.tmpl
/opt/smrtanalysis/redist/tomcat/webapps/smrtportal/WEB-INF/web.xml
Note: If you are not using SGE, you will need to deactivate the Celera Assembler protocols so that they do not display in SMRT Portal. To do so, rename the following files, located in common/protocols
. Rename the following files:
RS_CeleraAssembler.1.xml to RS_CeleraAssembler.1.bak
filtering/CeleraAssemblerSFilter.1.xml to CeleraAssemblerSFilter.1.bak
assembly/CeleraAssembler.1.xml to CeleraAssembler.1.bak
-
New Installations only: Set up user data folders that point to external storage.
-
New Installations only: [Set up SMRT Portal] (#Step8).
-
Start the SMRT Portal and Automatic Secondary Analysis Services.
-
[Verify] (#Step10) the installation.
The following are bundled within the application and should not depend on what is already deployed on the system.
- Java® 1.6
- Python® 2.5.2
- Tomcat™ 7.0.23
See SMRT Analysis Release Notes (v1.4.0) for changes and known issues. The latest version of the document resides on the Pacific Biosciences DevNet site; you can link to it from the main SMRT Analysis web page.
Extract the tarball to its final destination - this creates a smrtanalysis-1.4.0/ directory
. Be sure to use the tarball appropriate to your system - Ubuntu or CentOS.
Note: You need to run these commands as sudo if you do not have permission to write to the install folder. If the extracted folder is not owned by the user performing the installation (/opt
is typically owned by root), change the ownership of the folder and all its contents.
Example: To change permissions within /opt
:
sudo chown -R <thisuser>:<thisgroup> smrtanalysis-1.4.0
We recommend deploying to /opt
:
tar -C /opt -xvvzf <tarball_name>.tgz
We also recommend creating a symbolic link to /opt/smrtanalysis-1.4.0
with /opt/smrtanalysis
:
ln -s /opt/smrtanalysis-1.4.0 /opt/smrtanalysis
This enables subsequent upgrades to be transparent with a change in the symbolic link to the upgraded tarball directory.
Run the installation script:
cd $SEYMOUR_HOME/etc/scripts/postinstall
./configure_smrtanalysis.sh
The installation script requires the following input:
- The system name. (Default:
hostname -a
) - The port number that the services will run under. (Default:
8080
) - The Tomcat shutdown port. (Default:
8005
) - The user/group to run the services and set permissions for the files. (Default:
smrtanalysis:smrtanalysis
) - The mysql user name and password to install the database. (Default:
root:no password
)
The installation script performs the following:
- Creates the SMRT Portal database. Note: The mysql user performing the install must have permissions to alter or create databases. Otherwise, the installer will reject the user and prompt for another.
- Sets the host and port names for various configuration files.
- Sets the Tomcat/kodos user. The services will run as the specified user.
- Sets the user and group permissions and ownership of the application to the Tomcat user.
- Adds links in
/etc/init.d
to the Tomcat and kodos services. (The defaults are:/etc/init.d/kodosd
and/etc/init.d/tomcatd
.) These are soft links to the actual service files within the application. If a file is already present (for example, tomcatd is already installed), the link can be created with a different name. The permissions of the underlying scripts are limited to the user running the services. - Installs the services. The services will automatically restart if the system restarts. (On CentOS, the installer will run
chkconfig
to install the services, rather thanupdate-rc.d
.)
Note: The installer will attempt to run without sudo access first. If this fails, the installer will prompt the user for a sudo password and retry.
If you are upgrading from v1.3.3 to v1.4.0 and want to preserve SMRT Cells, jobs, and users from a previous installation:
Run upgrade_and_configure_smrtanalysis.sh
to update the database schema and the reference repository entries:
cd $SEYMOUR_HOME/etc/scripts/postinstall
./upgrade_and_configure_smrtanalysis.sh
Skip setting up the services: (These should already exist from the previous installation.)
Now creating symbolic links in /etc/init.d. Continue? [Y/n] n
SMRT Analysis provides support for distributed computation using an existing job management system. Pacific Biosciences has explicitly validated Sun Grid Engine (SGE), LSF and PBS.
Note: Celera Assembler 7.0 will only work correctly with the SGE job management system. If you are not using SGE, you will need to deactivate the Celera Assembler protocols so that they do not display in SMRT Portal. To do so, rename the following files, located in common/protocols
:
RS_CeleraAssembler.1.xml to RS_CeleraAssembler.1.bak
filtering/CeleraAssemblerSFilter.1.xml to CeleraAssemblerSFilter.1.bak
assembly/CeleraAssembler.1.xml to CeleraAssembler.1.bak
This section describes setup for SGE and gives guidance for extensions to other Job Management Systems.
Following are the options in the $SEYMOUR_HOME/analysis/etc/smrtpipe.rc
file that you can set to execute distributed SMRT Pipe runs.
Link to the SMRT Pipe section when ready
The central component for setting up distributed computing in SMRT Analysis are the Job Management Templates (JMTs). JMTs provide a flexible format for specifying how SMRT Analysis communicates with the resident JMS. There are two templates which must be modified for your system:
-
start.tmpl
is the legacy template used for assembly algorithms. -
interactive.tmpl
is the new template used for resequencing algorithms. The difference between the two is the additional requirement of a sync option ininteractive.tmpl
. (kill.tmpl
is not used.)
Note: We are in the process of converting all protocols to use only interactive.tmpl.
To customize a JMS for a particular environment, edit or create start.tmpl
and interactive.tmpl
. For example, the installation includes the following sample start.tmpl and interactive.tmpl (respectively) for SGE:
qsub -pe smp ${NPROC} -S /bin/bash -V -q secondary -N ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} ${EXTRAS} ${CMD}
qsub -S /bin/bash -sync y -V -q secondary -N ${JOB_ID} -o ${STDOUT_FILE} -e ${STDERR_FILE} -pe smp ${NPROC} ${CMD}
- Create a new directory in
etc/cluster/
underNEW_NAME
. - In
smrtpipe.rc
, change theCLUSTER_MANAGER
variable toNEW_NAME
, as described in “Smrtpipe.rc Configuration”. - Once you have a new JMS directory specified, edit the
interactive.tmpl
andstart.tmpl
files for your particular setup.
Sample SGE, LSF and PBS templates are included with the installation in $SEYMOUR_HOME/analysis/etc/cluste
r.
For this version (v1.4.0), you must still edit both interactive.tmpl
and start.tmpl
as follows:
- Change
secondary
to the queue name on your system. (This is the–q
option.) - Change
smp
to the parallel environment on your system. (This is the-pe
option.)
PBS does not have a –sync
option, so the interactive.tmpl file runs a script named qsw.py to simulate the functionality. You must edit both interactive.tmpl and start.tmpl.
- Change the queue name to one that exists on your system. (This is the
–q
option.) - Change the parallel environment to one that exists on your system. (This is the
-pe
option.) - Make sure that
interactive.tmpl
calls the–PBS
option.
Create an interactive.tmpl
file by copying the start.tmpl
file and adding the –K
functionality in the bsub
call. Or, you can also edit the sample LSF templates.
We have not tested the –sync
functionally on other systems. Find the equivalent to the –sync
option for your JMS and create an interactive.tmpl
file. If there is no -sync
option available, you may need to edit the qsw.py
script in $SEYMOUR_HOME/analysis/lib/python2.7/pbpy-0.1-py2.7.egg/EGG-INFO/scripts/qsw.py
to add additional options for wrapping jobs on your system.
The code for PBS and SGE looks like the following:
if '-PBS' in args:
args.remove('-PBS')
self.jobIdDecoder = PBS_JOB_ID_DECODER
self.noJobFoundCode = PBS_NO_JOB_FOUND_CODE
self.successCode = PBS_SUCCESS_CODE
self.qstatCmd = "qstat"
else:
self.jobIdDecoder = SGE_JOB_ID_DECODER
self.noJobFoundCode = SGE_NO_JOB_FOUND_CODE
self.successCode = SGE_SUCCESS_CODE
self.qstatCmd = "qstat -j"
Running jobs in distributed mode is disabled by default in SMRT Portal.
To enable distributed processing, set the jobsAreDistributed
value in $SEYMOUR_HOME/redist/tomcat/webapps/smrtportal/WEB-INF/web.xml
to true:
<context-param>
<param-name>jobsAreDistributed</param-name>
<param-value>true</param-value>
</context-param>
You will need to restart Tomcat.
The upgrade process will port over the configuration settings from the previous version.
SMRT Analysis saves references and results in its own hierarchy. Note that large amounts of data are generated and storage can get filled up. We suggest that you softlink to an external directory with more storage.
All jobs and references, as well as drop boxes, are contained in $SEYMOUR_HOME/common/userdata
. You can move this folder to another location, then soft link $SEYMOUR_HOME/common/userdata
to the new location.
If performing a fresh installation: For example
mv $SEYMOUR_HOME/common/userdata /my_offline_storage
ln -s /my_offline_storage/userdata $SEYMOUR_HOME/common/userdata
If upgrading, you need to point the new build to the external storage location. For example:
rm $SEYMOUR_HOME/common/userdata
ln -s /my_offline_storage/userdata $SEYMOUR_HOME/common/userdata
Note: The default protocols and underlying support files within common/protocols
and subfolders were updated significantly for v1.4.0. We strongly recommend that you recreate protocols for v1.4.0 rather than carry over protocols from previous versions.
- Use your web browser to start SMRT Portal:
http://HOST:PORT/smrtportal
- Click Register at the top right.
- Create a user named
administrator
(all lowercase). This user is special, as it is the only user that does not require activation on creation. - Enter the user name
administrator
. - Enter an email address. All administrative emails, such as new user registrations, will be sent to this address.
- Enter the password and confirm the password.
- Select Click Here to access Change Settings.
- To set up the mail server, enter the SMTP server information and click Apply. For email authentication, enter a user name and password. You can also enable Transport Layer Security.
- To enable automated submission from a PacBio® RS instrument, click Add under the Instrument Web Services URI field. Then, enter the following into the dialog box and click OK:
http://INSTRUMENT_PAP01:8081
INSTRUMENT_PAP01
is the IP address or name (pap01) of the instrument.
8081
is the port for the instrument web service.
- Select the new URI, then click Test to check if SMRT Portal can communicate with the instrument service.
- (Optional) You can delete the pre-existing instrument entry by clicking Remove.
- Start Tomcat:
sudo /$SEYMOUR_HOME/etc/scripts/tomcatd start
- Start kodos:
sudo /etc/init.d/kodosd start
Create a test job in SMRT Portal using canned installation data:
Open your web browser and clear the browser cache:
- Google Chrome: Choose Tools > Clear browsing data. Choose the beginning of time from the droplist, then check Empty the cache and click Clear browsing data.
- Internet Explorer: Choose Tools > Internet Options > General, then under Browsing history, click Delete. Check Temporary Internet files, then click Delete.
- Firefox: Choose Tools > Options > Advanced, then click the Network tab. In the Cached Web Content section, click Clear Now.
- Refresh the current page by pressing F5.
- Log into SMRT Portal by navigating to
http://HOST:PORT/smrtportal
. - Click Design Job.
- Click Import and Manage.
- Click Import SMRT Cells.
- Click Add.
- Enter
/opt/smrtanalysis/common/test/primary
, then click OK. - Select the new path and click Scan. You should get a dialog saying “One input was scanned." Note: If you are upgrading to v1.4.0, this cell will already have been imported into your system. In addition, the input was downsampled to speed the test and reduce the overall tarball size.
- Click Design Job.
- Click Create New.
- Enter a job name and comment.
- Select the protocol
RS_Resequencing.1
. - Under SMRT Cells Available, select a lambda cell and click the right-arrow button.
- Click Save on the bottom right, then click Start. The job should complete successfully.
- Click the SMRT View button. SMRT View should open with tracks displayed, and the reads displayed in the Details panel.
For Research Use Only. Not for use in diagnostic procedures. © Copyright 2010 - 2013, Pacific Biosciences of California, Inc. All rights reserved. Information in this document is subject to change without notice. Pacific Biosciences assumes no responsibility for any errors or omissions in this document. Certain notices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences products and/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions of Sale and the applicable license terms at http://www.pacificbiosciences.com/licenses.html.
Pacific Biosciences, the Pacific Biosciences logo, PacBio, SMRT and SMRTbell are trademarks of Pacific Biosciences in the United States and/or certain other countries. All other trademarks are the sole property of their respective owners.