Install VRPipe Dependencies - VertebrateResequencing/vr-pipe GitHub Wiki

VRPipe is designed with the notion of multiple different teams of people installing their own copy of VRPipe and using it independently of other teams that share the same cluster.

However VRPipe has a large set of dependencies and useful optional 3rd party software, the totality of which may take many hours to install. It is suggested that all this additional software be installed centrally (by your system administrator - someone with root access) so it available to everyone on all nodes of your cluster. This will make VRPipe's subsequent installation by each team (to their own private areas) quick and painless.

This guide shows you how to install all the dependencies and recommended optional software assuming you have root access and are using a linux OS that uses yum as it's package manager. If either of these is not true you will have to figure out the necessary adjustments to the suggested command lines yourself.

  1. Install gcc and make so that you can install other software.
    sudo yum -y install gcc make
  2. Install the MySQL client and Perl interface if you plan on using that (strongly recommended, as opposed to using SQLite, which may not handle the load from clusters with 100+ nodes).
    sudo yum install -y mysql perl-DBD-MySQL
  3. Install Neo4J by following http://yum.neo4j.org, preferably having separate production and testing Neo4J servers running on some dedicated machine, configured to allow non-localhost access and with unique ports.
  4. Install some libraries that are needed by one or more of the 3rd party software that VRPipe is commonly used with. Also install git to allow downloading, updating and possible development of some software, including VRPipe itself. Finally, also install java 1.7 and ant for 3rd party software that needs it.
    sudo yum install -y zlib-devel libxslt-devel ncurses ncurses-devel git java-1.7.0-openjdk java-1.7.0-openjdk-devel ant
    sudo /usr/sbin/alternatives --config java (and enter the number corresponding to 1.7)
  5. Install cpan so that VRPipe can later get its own CPAN dependencies. Also use cpan to update itself and install widely-used essential CPAN modules. Note that some modules ask a question during their installation process, so you will have to monitor the procedure and occasionally hit return to accept the default answer. sudo yum install -y perl-CPAN
    sudo cpan
    Reply yes to the question about auto-configuration, then:
    cpan> o conf init urllist
    (and answer 3, 32, 8 1 10)
    cpan> o conf prerequisites_policy follow
    cpan> o conf build_requires_install_policy yes
    cpan> o conf commit
    cpan> install Bundle::CPAN
    cpan> q
    If it reported that some of the modules failed to install, just try again (they most likely failed because they were dependent on another of the modules in Bundle::CPAN that hadn't been installed yet):
    sudo cpan
    cpan> install Bundle::CPAN
    cpan> q
  6. Install the more problematic CPAN modules that VRPipe requires using your OS package management system.
    sudo yum install -y perl-XML-LibXSLT perl-Devel-Cover perl-XML-Parser
  7. Create an area you can download and install software that isn't in your OS's package management system. Add this location to environment variables for ease of use. If you're creating an Amazon AMI you can arrange the shell login script such that the software directory can be moved to a shared filesystem where another shell login script will be sourced, allowing changes to software and environment variables without having to alter the AMI. (If you're not making an Amazon AMI, alter the suggested shell login script changes to taste; however the software area must be on a shared filesystem that is visible to all nodes in your cluster.)
    cd; mkdir software; cd software
    echo "if [ -f /shared/software/.profile ]; then . /shared/software/.profile; fi" >> ~/.bash_profile
    perl -MCwd -e 'print q[if [ -z "$SOFTWAREDIR" ]; then export SOFTWAREDIR=], cwd, "; fi\n";' >> ~/.bash_profile
    mkdir bin
    echo export LOCALBIN=\$SOFTWAREDIR/bin >> ~/.bash_profile
    echo export PATH=\$LOCALBIN:\$PATH >> ~/.bash_profile
    mkdir src; echo export SRCDIR=\$SOFTWAREDIR/src >> ~/.bash_profile
    mkdir -p lib/perl5; echo export PERL5LIB=\$SOFTWAREDIR/lib/perl5 >> ~/.bash_profile
    source ~/.bash_profile
  8. Install Redis, a requirement of VRPipe. For VRPipe's purposes, Redis does not need to be configured in any way or made to run at boot time; it only needs to be placed in the PATH.
    cd $SRCDIR; wget http://redis.googlecode.com/files/redis-2.6.14.tar.gz
    tar -xzf redis-2.6.14.tar.gz
    cd redis-2.6.14
    make
    cp src/redis-server src/redis-cli $LOCALBIN/
    The number of connections you can make to the redis server is determined by the number of file descriptors allowed, so increase this in the OS:
    sudo perl -e 'open($fh, ">>", "/etc/security/limits.conf"); print $fh qq[* hard nofile 131072\n* soft nofile 131072\n]'
    (to confirm this worked, log out and back in an back in again and run ulimit -n, which should print 131072)
  9. Use git to download the latest version of VRPipe, and make sure it is the stable version. Then install its CPAN dependencies (as with the previous CPAN module installations, you will have to monitor it for questions to answer, and on failure first just try installing again). Don't actually configure or install VRPipe though.
    cd $SRCDIR; git clone git://github.com/VertebrateResequencing/vr-pipe.git; cd vr-pipe;
    git checkout master
    Repeat perl Build.PL and sudo ./Build installdeps until the former no longer suggests you run the latter and instead asks a configuration question; you can ctrl-c to quit Build.PL at that point without answering any questions. Some of the CPAN modules may not install even on repeated retries, in which case follow the advice in the README file on how to install the problematic ones. For example:
    sudo cpan
    cpan> force install Inline::Filters
    cpan> install JJNAPIORK/MooseX-Types-Parameterizable-0.07.tar.gz
    cpan> q
    When all the dependencies have successfully installed, clean up the directory to be ready to configure later on:
    ./Build realclean
    Carry out some additional recommended setup for VRPipe: mkdir ~/.Inline; echo export PERL_INLINE_DIRECTORY=\$HOME/.Inline >> ~/.bash_profile
    For use with the ec2 or sge_ec2 schedulers, VRPipe has an unadvertised dependency of VM::EC2, so install that manually:
    sudo cpan VM::EC2
  10. Now that all the basic CPAN dependencies have been installed as root, set up local::lib to make it easy to subsequently install CPAN modules to $SOFTWAREDIR without root privileges. Also install cpan minus to make life even easier.
    sudo cpan local::lib
    echo export PERL_LOCAL_LIB_ROOT=\$SOFTWAREDIR >> ~/.bash_profile
    echo export PERL_MB_OPT=\"--install_base \$SOFTWAREDIR\" >> ~/.bash_profile
    echo export PERL_MM_OPT=\"INSTALL_BASE=\$SOFTWAREDIR\" >> ~/.bash_profile
    source ~/.bash_profile
    curl -L http://cpanmin.us | perl - App::cpanminus
  11. Optionally, install gitflow, which may come in handy should you decide to do any development.
    cd $SRCDIR
    wget --no-check-certificate -q -O - https://raw.github.com/nvie/gitflow/develop/contrib/gitflow-installer.sh | sudo bash
    sudo chown -R ec2-user:ec2-user gitflow
  12. Optionally, install whatever 3rd-party software you're most likely to use in your VRPipe pipelines, eg. some of the bioinformatics software we used for the 1000 genomes project and similar sequencing projects. When the version of software used in a pipeline could matter, it is installed with the version number in the executable name, with a symlink pointing to the latest version for convenience outside of VRPipe.
    fastqcheck:
    cd $SRCDIR
    git clone git://github.com/VertebrateResequencing/fastqcheck.git; cd fastqcheck
    gcc -std=c99 readseq.c fastqcheck.c -o fastqcheck -lm
    cp fastqcheck $LOCALBIN/
    samtools:
    cd $SRCDIR
    git clone git://github.com/samtools/samtools.git; cd samtools
    nano Makefile and edit the CFLAGS line by appending -fPIC -m64 before using ctrl-x, y, return to save
    make git-stamp
    cp samtools $LOCALBIN/samtools-0.1.19-44-g133e01b; ln -s $LOCALBIN/samtools-0.1.19-44-g133e01b $LOCALBIN/samtools
    cp bcftools/bcftools $LOCALBIN/bcftools-0.1.19-44-g133e01b; ln -s $LOCALBIN/bcftools-0.1.19-44-g133e01b $LOCALBIN/bcftools
    cp misc/bamcheck misc/plot-bamcheck $LOCALBIN/
    echo export SAMTOOLS=\$SRCDIR/samtools >> ~/.bash_profile
    bwa:
    cd $SRCDIR
    git clone git clone https://github.com/lh3/bwa.git; cd bwa
    git checkout 0.5.9 (0.5.9-r16 was used for the 1000 genomes project)
    make
    cp bwa $LOCALBIN/bwa-0.5.9-r16
    make clean
    git checkout master
    make
    cp bwa $LOCALBIN/bwa-0.7.5a-r405; ln -s $LOCALBIN/bwa-0.7.5a-r405 $LOCALBIN/bwa
    gatk:
    cd $SRCDIR
    git clone https://github.com/broadgsa/gatk-protected.git; cd gatk-protected
    git checkout 1.2 (the version used for the 1000 genomes project)
    ant
    cp -r dist $LOCALBIN/gatk-1.2; cp LICENSE $LOCALBIN/gatk-1.2
    ln -s $LOCALBIN/gatk-1.2 $LOCALBIN/gatk
    echo export GATK=\$LOCALBIN/gatk >> ~/.bash_profile
    ant clean
    If you are an academic non-profit (and if you are creating an AMI it will remain private), you can also install the latest version of gatk (otherwise do cd ..; rm -fr gatk-protected):
    git checkout master; ant
    java -jar dist/GenomeAnalysisTK.jar --help (to confirm it worked and find out the version)
    cp -r dist $LOCALBIN/gatk-2.6-4-g3e5ff60
    ln -s $LOCALBIN/gatk-2.6-4-g3e5ff60 $LOCALBIN/gatk2
    echo export GATK2=\$LOCALBIN/gatk2 >> ~/.bash_profile
    ant clean
    picard:
    cd $SRCDIR
    (if you want it, first get the version used for the 1000 genomes project)
    wget http://downloads.sourceforge.net/project/picard/picard-tools/1.53/picard-tools-1.53.zip
    unzip picard-tools-1.53.zip
    mv picard-tools-1.53 $LOCALBIN/
    (now get the latest version)
    wget http://downloads.sourceforge.net/project/picard/picard-tools/1.93/picard-tools-1.93.zip
    unzip picard-tools-1.93.zip
    rm snappy-java-1.0.3-rc3.jar
    mv picard-tools-1.93 $LOCALBIN/; ln -s $LOCALBIN/picard-tools-1.93 $LOCALBIN/picard-tools
    echo export PICARD=\$LOCALBIN/picard-tools >> ~/.bash_profile
    R:
    cd $SRCDIR
    wget http://cran.r-project.org/src/base/R-3/R-3.0.1.tar.gz
    tar -xzf R-3.0.1.tar.gz; cd R-3.0.1
    sudo yum install -y gcc-gfortran gcc-c++ readline-devel
    ./configure --prefix=$SOFTWAREDIR --enable-R-shlib --with-x=no
    make && make install