SMRT analysis software installation v2.1.1 - dyim42/SMRT-Analysis GitHub Wiki
- [Important Changes] (#ImportantChanges)
- [System Requirements] (#SysReq)
- [Operating System] (#OS)
- [Running SMRT® Analysis in the Cloud] (#Cloud)
- [Software Requirement] (#SoftReq)
- [Minimum Hardware Requirements] (#HardReq)
- [Installation and Upgrade Summary] (#Summary)
- [Step 1: Decide on a user and an installation directory] (#Bookmark_DecideInstallDir)
- [Step 2: Create and set the installation directory $SMRT_ROOT] (#Bookmark_CreateInstallDir)
- [Installation and Upgrade Detail] (#Details)
- [Step 3 Option 1: Run the install script] (#Bookmark_InstallDetail)
- [Step 3 Option 2: Run the upgrade script] (#Bookmark_UpgradeDetail)
- [Step 4: Set up distributed computing] (#Bookmark_DistributedDetail)
- [Step 5: Set up SMRT Portal] (#Bookmark_SMRTPortalDetail)
- [Step 6: Verify install or upgrade] (#Bookmark_VerifyDetail)
- [Optional Configurations] (#Optional)
- [Set up userdata directory] (#Bookmark_UserdataDetail)
- [Bundled with SMRT® Analysis] (#Bundled)
- [Changes from SMRT® Analysis v2.0.1] (#Changes)
SMRT Analysis migrated to a completely new directory structure starting with v2.1. Instead of $SEYMOUR_HOME, we are now using $SMRT_ROOT, and you will not need to specify it explicitly. We still recommend that $SMRT_ROOT be set to /opt/smrtanalysis/, but the underlying folders will be as follows (arrows indicate softlinks):
/opt/smrtanalysis/
admin/
bin/
log/
current --> softlink to ../install/smrtanalysis-2.1.1
install/
smrtanalysis-<other versions>/
smrtanalysis-2.1.1/
userdata/ --> softlink to offline storage location
-
SMRT® Analysis is only supported on:
- English-language Ubuntu 12.04, Ubuntu 10.04, Ubuntu 8.04
- English-language RedHat/CentOS 6.3, RedHat/CentOS 5.6, RedHat/CentOS 5.3
-
If you are using alternate versions of Ubuntu or CentOS (not recommended), you should download and install the SMRT Analysis executable that is older than the OS installed on your system. (For example, if you are running CentOS 6.4, you should run the CentOS 6.3 executable). The software assumes a uniform operating system across all compute nodes. If you have different OS versions on your cluster (not recommended), choose an executable that matches the oldest OS on your compute nodes.
-
Check for any library errors when running an initial
RS_resequencinganalysis job on lambda. Here are some common packages that need to be installed:-
RedHat/CentOS 5.xxx: Enter
sudo yum install mysql-server perl-XML-Parser openssl redhat-lsb -
RedHat/CentOS 6.xxx: Enter
sudo yum install mysql-server perl-XML-Parser openssl098e redhat-lsb -
Ubuntu 10.xxx: Enter
sudo aptitude install mysql-server libxml-parser-perl libssl0.9.8
-
RedHat/CentOS 5.xxx: Enter
-
SMRT Analysis cannot be installed on the Mac OS or Windows.
Users who do not have access to a server with the supported OS can use the public Amazon Machine Image (AMI). For details, see the document [Running SMRT Analysis on Amazon] (https://s3.amazonaws.com/files.pacb.com/software/smrtanalysis/2.1/doc/Running SMRT Analysis on Amazon.pdf).
- MySQL 5 (
yum install mysql-server;apt-get install mysql-server) - bash
- Perl (v5.8.8)
- Statistics::Descriptive Perl module:
sudo cpan Statistics::Descriptive
- Statistics::Descriptive Perl module:
We recommend using Google Chrome® 21 web browsers to run SMRT Portal for consistent functionality. We also support Apple’s Safari® and Internet Explorer® web browsers; however some features may not be optimized on these browsers.
To run SMRT View, we recommend using Java 7 for Windows (Java 7 64 bit for users with 64 bit OS), and Java 6 for the Mac OS.
- Minimum 8 cores, with 2 GB RAM per core. We recommend 16 cores with 4 GB RAM per core for de novo assemblies and larger references such as human.
- Minimum 250 GB of disk space.
- Minimum 3 compute nodes. We recommend 5 nodes for high utilization focused on de novo assemblies.
- Minimum 8 cores per node, with 2 GB RAM per core. We recommend 16 cores per node with 4 GB RAM per core.
- Minimum 250 GB of disk space per node.
- To perform de novo assembly of large genomes using the Celera® Assembler, one of the nodes will need to have considerably more memory. See the Celera® Assembler home page for recommendations: http://wgs-assembler.sourceforge.net/.
Notes:
-
It is possible, but not advisable, to install SMRT Analysis on a single-node machine (see the distributed computing section). You will likely be able to submit jobs one SMRT Cell at a time, but the time to completion may be long as the software may not have sufficient resources to complete the job.
-
The
RS_ReadsOfInsertprotocol can be compute-intensive. If you plan to run it on every SMRT Cell, we recommend adding 3 additional 8-core compute nodes with at least 4 GB of RAM per core.
- 10 TB (Actual storage depends on usage.)
Please refer to the IT Site Prep guide provided with your instrument purchase for more details.
-
The SMRT Analysis software directory (We recommend
$SMRT_ROOT=/opt/smrtanalysis) must have the same path and be readable by the smrtanalysis user across all compute nodes via NFS. -
The SMRT Cell input directory (We recommend
$SMRT_ROOT/pacbio_insrument_data/) must have the same path and be readable by the smrtanalysis user across all compute nodes via NFS. This directory contains data from the instrument and can either be a directory configured by RS Remote during instrument installation, or a directory you created when you received data from a core lab. -
The SMRT Analysis output directory (We recommend
$SMRT_ROOT/userdata) must have the same path and be writable by the smrtanalysis user across all compute nodes via NFS. This directory is usually soft-linked to a large storage volume. -
The SMRT Analysis temporary directory is used for fast I/O operations during runtime. The software accesses this directory from
$SMRT_ROOT/tmpdirand you can softlink this directory manually or using the install script. This directory should be a local directory (not NFS mounted) and be writable by thesmrtanalysisuser and exist as independent directories on all compute nodes.
Please pay close attention as the upgrade procedure has changed.
The following instructions apply to fresh v2.1.1 installations and v2.0.1 to v2.1.1 upgrades only.
- If you are using an older version of SMRT Analysis, you can either perform a fresh installation and manually import old SMRT Cells and jobs, or download and upgrade any intermediate versions (v1.4, v2.0.0, v2.0.1).
The SMRT Analysis install directory, $SMRT_ROOT, can be any directory as long as the smrtanalysis user has read, write, and execute permissions in that directory. Historically we have referred to $SMRT_ROOT as /opt/smrtanalysis.
We recommend that a system administrator create a special user called smrtanalysis, who belongs to the smrtanalysis group. This user will own all SMRT Analysis files, daemon processes, and smrtpipe jobs.
If the parent directory $SMRT_ROOT is not writable by the SMRT Analysis user, the $SMRT_ROOT directory must be pre-created with read/write/execute permissions for the SMRT Analysis user.
-
Option 1: The SMRT Analysis user has sudo privileges.
For example, if
$SMRT_ROOTis/opt/smrtanalysis,/optis only writable by root, and the SMRT Analysis user issmrtanalysisbelonging to the groupsmrtanalysis.SMRT_ROOT=/opt/smrtanalysis sudo mkdir $SMRT_ROOT sudo chown smrtanalysis:smrtanalysis $SMRT_ROOT -
Option 2: The SMRT Analysis user does not have sudo privileges.
For example, if you do not have sudo privileges, you can install SMRT Analysis as yourself in your home directory, however you still must have root login credentials for the mysql database.
SMRT_ROOT=/home/<your_username>/smrtanalysis mkdir $SMRT_ROOT
- Option 1: If you are performing a fresh installation, run the installation script and start tomcat and kodos. [See below for more details.] (#Bookmark_InstallDetail)
bash smrtanalysis-2.1.1.Current_Ubuntu-8.04.run --rootdir $SMRT_ROOT
$SMRT_ROOT/admin/bin/tomcatd start
$SMRT_ROOT/admin/bin/kodosd start
If you need to rerun the script and have already extracted the file, you can rerun using the --no-extract option:
bash smrtanalysis-2.1.1.Current_Ubuntu-8.04.run --rootdir $SMRT_ROOT --no-extract
-
Option 2: Please pay close attention as the upgrade procedure has changed. The new procedure requires running a script called
smrtupdaterfrom the old v2.0.1 smrtanalysis directory, which takes the path to the new v2.1.1 installer as an argument.
IMPORTANT: If$SMRT_ROOTis a pre-existing symbolic link (e.g./opt/smrtanalysis-->/opt/smrtanalysis-2.0.1), you must manually delete the softlink and create a new directory this time only. [See below for more details.] (#Bookmark_UpgradeDetail)
/opt/smrtanalysis-2.0.1/etc/scripts/kodosd stop
/opt/smrtanalysis-2.0.1/etc/scripts/tomcatd stop
rm /opt/smrtanalysis
mkdir /opt/smrtanalysis
SMRT_PATH_ORIG=”$PATH” SMRT_ROOTDIR="/opt/smrtanalysis" bash /opt/smrtanalysis-2.0.1/admin/bin/smrtupdater /opt/smrtanalysis-2.1.1.Current_Ubuntu-8.04.run
/opt/smrtanalysis/admin/bin/tomcatd start
/opt/smrtanalysis/admin/bin/kodosd start
Note: For future upgrades beyond v2.1.1, we expect the upgrade command to be $SMRT_ROOT/admin/bin/smrtupdater /path/to/smrtanalysis-2.1.1.Current_Ubuntu-8.04.run
Decide on a job management system (JMS). See below for more details.
Register the administrative user and set up the SMRT Portal GUI. See below for more details.
Run a sample SMRT Portal job to verify functionality. [See below for more details.] (#Bookmark_VerifyDetail)
The installation script attempts to discover inputs when possible, and performs the following:
- Looks for valid hostnames (DNS) and IP Addresses. You must choose one from the list.
- Assumes that the user running the script is the designated smrtanalysis user.
- Installs the Tomcat web server. You will be prompted for:
- The port number that the tomcat service will run under. (Default:
8080) - The port number that the tomcat service will use to shutdown. (Default:
8005)
- The port number that the tomcat service will run under. (Default:
- Creates the smrtportal database in mysql. You will be prompted for:
- The mysql administrative user name. (Default:
root) - The mysql password. (Default: no password)
- The mysql port number. (Default:
3306)
- The mysql administrative user name. (Default:
- Attempts to configure the Job Management System (
SGE,LSF,PBS, orNONE)- The
$SGE_ROOTdirectory - The
$SGE_CELLdirectory name - The
$SGE_BINDIRdirectory that contains all the q-commands - The queue name
- The parallel environment
- The
- Creates and configures special directories:
- The
$TMPdirectory - The
$USERDATAdirectory
- The
The upgrade script performs the following:
- Checks that the same user is running the upgrade script
- Checks for running services
- Checks that the OS and hardware requirements are still met
- Transfers computing configurations from a previous installation
- Upgrades any references as necessary
- Preserves SMRT Cells, jobs, and users from a previous installation by updating smrtportal database schema changes as necessary
- Preserves special directories settings
- Updates the
$SMRT_ROOT/tmpdirsoftlink - Updates the
$SMRT_ROOT/userdatasoftlink
- Updates the
- The upgrade script does not port over protocols that were defined in previous versions of SMRT Analysis. This is because protocol files can vary a great deal between versions due to rapid code development and change. Please recreate any custom protocols you may have.
Pacific Biosciences has explicitly validated Sun Grid Engine (SGE), and provide job submission templates for LSF and PBS. You only need to configure the software once during initial install.
The central component for setting up distributed computing in SMRT Analysis are the Job Management Templates, which provide a flexible format for specifying how SMRT Analysis communicates with the resident Job Management System (JMS). If you are using a non-SGE job managment system, you must create or edit the following files:
/opt/smrtanalysis/analysis/etc/cluster/<JMS>/start.tmpl
/opt/smrtanalysis/analysis/etc/cluster/<JMS>/interactive.tmpl
/opt/smrtanalysis/analysis/etc/cluster/<JMS>/kill.tmpl
PBS does not have a –sync option, so the interactive.tmpl file runs a script named qsw.py to simulate the functionality. You must edit both interactive.tmpl and start.tmpl.
- Change the queue name to one that exists on your system. (This is the
–qoption.) - Change the parallel environment to one that exists on your system. (This is the
-peoption.) - Make sure that
interactive.tmplcalls the–PBSoption.
The equivalent SGE -sync option in LSF is -K and this should be provided with the bsub command in the interactive.tmpl file.
- Change the queue name to one that exists on your system. (This is the
–qoption.) - Change the parallel environment to one that exists on your system. (This is the
-peoption.) - Make sure that
interactive.tmplcalls the–Koption.
- Create a new directory
smrtanalysis/current/analysis/etc/cluster/NEW_JMS. - Edit
smrtanalysis/current/analysis/etcsmrtpipe.rc, and change theCLUSTER_MANAGERvariable toNEW_JMS - Once you have a new JMS directory specified, create and edit the
interactive.tmpl,start.tmpl, andkill.tmplfiles for your particular setup.
- Use your web browser to start SMRT Portal:
http://hostname:port/smrtportal - Click Register at the top right.
- Create a user named
administrator(all lowercase). This user is special, as it is the only user that does not require activation on creation. - Enter the user name
administrator. - Enter an email address. All administrative emails, such as new user registrations, will be sent to this address.
- Enter the password and confirm the password.
- Select Click Here to access Change Settings.
- To set up the mail server, enter the SMTP server information and click Apply. For email authentication, enter a user name and password. You can also enable Transport Layer Security.
- To enable automated submission from a PacBio® RS instrument, click Add under the Instrument Web Services URI field. Then, enter the following into the dialog box and click OK:
http://INSTRUMENT_PAP01:8081
INSTRUMENT_PAP01 is the IP address or name (pap01) of the instrument.
8081 is the port for the instrument web service.
- Select the new URI, then click Test to check if SMRT Portal can communicate with the instrument service.
- (Optional) You can delete the pre-existing instrument entry by clicking Remove.
Create a test job in SMRT Portal using the provided lambda sequence data. This is data from a single SMRT cell that has been down-sampled to reduce overall tarball size. If you are upgrading, this cell will already have been imported into your system, and you can skip to step 10 below.
Open your web browser and clear the browser cache:
- Google Chrome: Choose Tools > Clear browsing data. Choose the beginning of time from the droplist, then check Empty the cache and click Clear browsing data.
- Internet Explorer: Choose Tools > Internet Options > General, then under Browsing history, click Delete. Check Temporary Internet files, then click Delete.
- Firefox: Choose Tools > Options > Advanced, then click the Network tab. In the Cached Web Content section, click Clear Now.
- Refresh the current page by pressing F5.
- Log into SMRT Portal by navigating to
http://HOST:PORT/smrtportal. - Click Design Job.
- Click Import and Manage.
- Click Import SMRT Cells.
- Click Add.
- Enter
/opt/smrtanalysis/common/test/primary, then click OK. - Select the new path and click Scan. You should get a dialog saying “One input was scanned."
- Click Design Job.
- Click Create New.
- Enter a job name and comment.
- Select the protocol
RS_Resequencing.1. - Under SMRT Cells Available, select a lambda cell and click the right-arrow button.
- Click Save on the bottom right, then click Start. The job should complete successfully.
- Click the SMRT View button. SMRT View should open with tracks displayed, and the reads displayed in the Details panel.
The userdata folder, $SMRT_ROOT/userdata, expands rapidly because it contains all jobs, references, and drop boxes. We recommend softlinking this folder to an external directory with more storage:
mv /opt/smrtanalysis/userdata /path/to/NFS/mounted/offline_storage
ln -s /path/to/NFS/mounted/offline_storage /opt/smrtanalysis/common/userdata
The following are bundled within the application and should not depend on what is already deployed on the system.
- Java® 1.7
- Python® 2.7
- Tomcat™ 7.0.23
See SMRT Analysis Release Notes v2.1.1 for changes and known issues. The latest version of this document resides on the Pacific Biosciences DevNet site; you can also link to it from the main SMRT Analysis web page.
For Research Use Only. Not for use in diagnostic procedures. © Copyright 2010 - 2013, Pacific Biosciences of California, Inc. All rights reserved. Information in this document is subject to change without notice. Pacific Biosciences assumes no responsibility for any errors or omissions in this document. Certain notices, terms, conditions and/or use restrictions may pertain to your use of Pacific Biosciences products and/or third party products. Please refer to the applicable Pacific Biosciences Terms and Conditions of Sale and the applicable license terms at http://www.pacificbiosciences.com/licenses.html. P/N 100-299-000