Install VRPipe Dependencies - VertebrateResequencing/vr-pipe GitHub Wiki

VRPipe is designed with the notion of multiple different teams of people installing their own copy of VRPipe and using it independently of other teams that share the same cluster.

However VRPipe has a large set of dependencies and useful optional 3rd party software, the totality of which may take many hours to install. It is suggested that all this additional software be installed centrally (by your system administrator - someone with root access) so it available to everyone on all nodes of your cluster. This will make VRPipe's subsequent installation by each team (to their own private areas) quick and painless.

This guide shows you how to install all the dependencies and recommended optional software assuming you have root access and are using a linux OS that uses yum as it's package manager. If either of these is not true you will have to figure out the necessary adjustments to the suggested command lines yourself.

Install gcc and make so that you can install other software.
sudo yum -y install gcc make
Install the MySQL client and Perl interface if you plan on using that (strongly recommended, as opposed to using SQLite, which may not handle the load from clusters with 100+ nodes).
sudo yum install -y mysql perl-DBD-MySQL
Install Neo4J by following http://yum.neo4j.org, preferably having separate production and testing Neo4J servers running on some dedicated machine, configured to allow non-localhost access and with unique ports.
Install some libraries that are needed by one or more of the 3rd party software that VRPipe is commonly used with. Also install git to allow downloading, updating and possible development of some software, including VRPipe itself. Finally, also install java 1.7 and ant for 3rd party software that needs it.
sudo yum install -y zlib-devel libxslt-devel ncurses ncurses-devel git java-1.7.0-openjdk java-1.7.0-openjdk-devel ant
sudo /usr/sbin/alternatives --config java (and enter the number corresponding to 1.7)
Install cpan so that VRPipe can later get its own CPAN dependencies. Also use cpan to update itself and install widely-used essential CPAN modules. Note that some modules ask a question during their installation process, so you will have to monitor the procedure and occasionally hit return to accept the default answer. sudo yum install -y perl-CPAN
sudo cpan
Reply yes to the question about auto-configuration, then:
cpan> o conf init urllist
(and answer 3, 32, 8 1 10)
cpan> o conf prerequisites_policy follow
cpan> o conf build_requires_install_policy yes
cpan> o conf commit
cpan> install Bundle::CPAN
cpan> q
If it reported that some of the modules failed to install, just try again (they most likely failed because they were dependent on another of the modules in Bundle::CPAN that hadn't been installed yet):
sudo cpan
cpan> install Bundle::CPAN
cpan> q
Install the more problematic CPAN modules that VRPipe requires using your OS package management system.
sudo yum install -y perl-XML-LibXSLT perl-Devel-Cover perl-XML-Parser
Create an area you can download and install software that isn't in your OS's package management system. Add this location to environment variables for ease of use. If you're creating an Amazon AMI you can arrange the shell login script such that the software directory can be moved to a shared filesystem where another shell login script will be sourced, allowing changes to software and environment variables without having to alter the AMI. (If you're not making an Amazon AMI, alter the suggested shell login script changes to taste; however the software area must be on a shared filesystem that is visible to all nodes in your cluster.)
cd; mkdir software; cd software
echo "if [ -f /shared/software/.profile ]; then . /shared/software/.profile; fi" >> ~/.bash_profile
perl -MCwd -e 'print q[if [ -z "$SOFTWAREDIR" ]; then export SOFTWAREDIR=], cwd, "; fi\n";' >> ~/.bash_profile
mkdir bin
echo export LOCALBIN=\$SOFTWAREDIR/bin >> ~/.bash_profile
echo export PATH=\$LOCALBIN:\$PATH >> ~/.bash_profile
mkdir src; echo export SRCDIR=\$SOFTWAREDIR/src >> ~/.bash_profile
mkdir -p lib/perl5; echo export PERL5LIB=\$SOFTWAREDIR/lib/perl5 >> ~/.bash_profile
source ~/.bash_profile
Install Redis, a requirement of VRPipe. For VRPipe's purposes, Redis does not need to be configured in any way or made to run at boot time; it only needs to be placed in the PATH.
cd $SRCDIR; wget http://redis.googlecode.com/files/redis-2.6.14.tar.gz
tar -xzf redis-2.6.14.tar.gz
cd redis-2.6.14
make
cp src/redis-server src/redis-cli $LOCALBIN/
The number of connections you can make to the redis server is determined by the number of file descriptors allowed, so increase this in the OS:
sudo perl -e 'open($fh, ">>", "/etc/security/limits.conf"); print $fh qq[* hard nofile 131072\n* soft nofile 131072\n]'
(to confirm this worked, log out and back in an back in again and run ulimit -n, which should print 131072)
Use git to download the latest version of VRPipe, and make sure it is the stable version. Then install its CPAN dependencies (as with the previous CPAN module installations, you will have to monitor it for questions to answer, and on failure first just try installing again). Don't actually configure or install VRPipe though.
cd $SRCDIR; git clone git://github.com/VertebrateResequencing/vr-pipe.git; cd vr-pipe;
git checkout master
Repeat perl Build.PL and sudo ./Build installdeps until the former no longer suggests you run the latter and instead asks a configuration question; you can ctrl-c to quit Build.PL at that point without answering any questions. Some of the CPAN modules may not install even on repeated retries, in which case follow the advice in the README file on how to install the problematic ones. For example:
sudo cpan
cpan> force install Inline::Filters
cpan> install JJNAPIORK/MooseX-Types-Parameterizable-0.07.tar.gz
cpan> q
When all the dependencies have successfully installed, clean up the directory to be ready to configure later on:
./Build realclean
Carry out some additional recommended setup for VRPipe: mkdir ~/.Inline; echo export PERL_INLINE_DIRECTORY=\$HOME/.Inline >> ~/.bash_profile
For use with the ec2 or sge_ec2 schedulers, VRPipe has an unadvertised dependency of VM::EC2, so install that manually:
sudo cpan VM::EC2
Now that all the basic CPAN dependencies have been installed as root, set up local::lib to make it easy to subsequently install CPAN modules to $SOFTWAREDIR without root privileges. Also install cpan minus to make life even easier.
sudo cpan local::lib
echo export PERL_LOCAL_LIB_ROOT=\$SOFTWAREDIR >> ~/.bash_profile
echo export PERL_MB_OPT=\"--install_base \$SOFTWAREDIR\" >> ~/.bash_profile
echo export PERL_MM_OPT=\"INSTALL_BASE=\$SOFTWAREDIR\" >> ~/.bash_profile
source ~/.bash_profile
curl -L http://cpanmin.us | perl - App::cpanminus
Optionally, install gitflow, which may come in handy should you decide to do any development.
cd $SRCDIR
wget --no-check-certificate -q -O - https://raw.github.com/nvie/gitflow/develop/contrib/gitflow-installer.sh | sudo bash
sudo chown -R ec2-user:ec2-user gitflow
Optionally, install whatever 3rd-party software you're most likely to use in your VRPipe pipelines, eg. some of the bioinformatics software we used for the 1000 genomes project and similar sequencing projects. When the version of software used in a pipeline could matter, it is installed with the version number in the executable name, with a symlink pointing to the latest version for convenience outside of VRPipe.
fastqcheck:
cd $SRCDIR
git clone git://github.com/VertebrateResequencing/fastqcheck.git; cd fastqcheck
gcc -std=c99 readseq.c fastqcheck.c -o fastqcheck -lm
cp fastqcheck $LOCALBIN/
samtools:
cd $SRCDIR
git clone git://github.com/samtools/samtools.git; cd samtools
nano Makefile and edit the CFLAGS line by appending -fPIC -m64 before using ctrl-x, y, return to save
make git-stamp
cp samtools $LOCALBIN/samtools-0.1.19-44-g133e01b; ln -s $LOCALBIN/samtools-0.1.19-44-g133e01b $LOCALBIN/samtools
cp bcftools/bcftools $LOCALBIN/bcftools-0.1.19-44-g133e01b; ln -s $LOCALBIN/bcftools-0.1.19-44-g133e01b $LOCALBIN/bcftools
cp misc/bamcheck misc/plot-bamcheck $LOCALBIN/
echo export SAMTOOLS=\$SRCDIR/samtools >> ~/.bash_profile
bwa:
cd $SRCDIR
git clone git clone https://github.com/lh3/bwa.git; cd bwa
git checkout 0.5.9 (0.5.9-r16 was used for the 1000 genomes project)
make
cp bwa $LOCALBIN/bwa-0.5.9-r16
make clean
git checkout master
make
cp bwa $LOCALBIN/bwa-0.7.5a-r405; ln -s $LOCALBIN/bwa-0.7.5a-r405 $LOCALBIN/bwa
gatk:
cd $SRCDIR
git clone https://github.com/broadgsa/gatk-protected.git; cd gatk-protected
git checkout 1.2 (the version used for the 1000 genomes project)
ant
cp -r dist $LOCALBIN/gatk-1.2; cp LICENSE $LOCALBIN/gatk-1.2
ln -s $LOCALBIN/gatk-1.2 $LOCALBIN/gatk
echo export GATK=\$LOCALBIN/gatk >> ~/.bash_profile
ant clean
If you are an academic non-profit (and if you are creating an AMI it will remain private), you can also install the latest version of gatk (otherwise do cd ..; rm -fr gatk-protected):
git checkout master; ant
java -jar dist/GenomeAnalysisTK.jar --help (to confirm it worked and find out the version)
cp -r dist $LOCALBIN/gatk-2.6-4-g3e5ff60
ln -s $LOCALBIN/gatk-2.6-4-g3e5ff60 $LOCALBIN/gatk2
echo export GATK2=\$LOCALBIN/gatk2 >> ~/.bash_profile
ant clean
picard:
cd $SRCDIR
(if you want it, first get the version used for the 1000 genomes project)
wget http://downloads.sourceforge.net/project/picard/picard-tools/1.53/picard-tools-1.53.zip
unzip picard-tools-1.53.zip
mv picard-tools-1.53 $LOCALBIN/
(now get the latest version)
wget http://downloads.sourceforge.net/project/picard/picard-tools/1.93/picard-tools-1.93.zip
unzip picard-tools-1.93.zip
rm snappy-java-1.0.3-rc3.jar
mv picard-tools-1.93 $LOCALBIN/; ln -s $LOCALBIN/picard-tools-1.93 $LOCALBIN/picard-tools
echo export PICARD=\$LOCALBIN/picard-tools >> ~/.bash_profile
R:
cd $SRCDIR
wget http://cran.r-project.org/src/base/R-3/R-3.0.1.tar.gz
tar -xzf R-3.0.1.tar.gz; cd R-3.0.1
sudo yum install -y gcc-gfortran gcc-c++ readline-devel
./configure --prefix=$SOFTWAREDIR --enable-R-shlib --with-x=no
make && make install