Home - TheJacksonLaboratory/CloudNeo GitHub Wiki
This repository has the CWL implementation of CloudNeo: A cloud pipeline for identifying patient-specific tumor neoantigen.
The workflow was developed on Seven Bridges Genomics' CGC platform using CWL-draft 2 specifications. The CGC is still in CWL-draft2 specifications as of March 2017. There are differences between the draft-2
and the current CWL version v.1.0
.
- Required Software
- Setting up the Environment
- Building the Docker Images
- Download reference files
- Running/Testing the CWL with the Rabix executor
- Docker
- Java JDK
- Rabix executor
- Docker images / software versions
- bwa=v0.7.13
-
hlaminer=v1.3
- Here is note about the license agreement for HLAminer
- Polysolver=v1.0
-
netMHC=v4.0a
- The following form must be filled to get the software.
-
netMHCpan=v3.0a
- The following form must be filled to get the software.
- samtools=1.3
Note: The Dockerfiles point directly to repositories/links to download the respective tool, except for netmhcpan tool (there is no direct link to download netMHC).
To setup the environment we need to install Docker, Java and Rabix (mainly for testing the CWL code). These instructions have been tested on Ubuntu system, but it should work on any Linux/Unix like systems.
- sudo apt-get update
- sudo apt-get install apt-transport-https ca-certificates
- sudo apt-key adv --keyserver hkp://p80.pool.sks-keyservers.net:80 --recv-keys 58118E89F3A912897C070ADBF76221572C52609D
- Open the /etc/apt/sources.list.d/backports.list file in your favorite editor.
- vim /etc/apt/sources.list.d/backports.list
- Add "deb https://apt.dockerproject.org/repo debian-jessie main" to the file
- sudo apt-get update
- sudo apt-cache policy docker-engine
- sudo apt-get install docker-engine
- sudo service docker start
- sudo gpasswd -a ${USER} docker
- sudo service docker restart
- sudo chown "$USER":"$USER" /home/"$USER"/.docker -R
- sudo chmod g+rwx "/home/$USER/.docker" -R
Full instructions to install Docker are avialable on Docker Install Page
- Download the Java JDK from this url: http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html
export JAVA_HOME=<YOUR INSTALLATION DIRECTORY>
export PATH="$JAVA_HOME/bin:$PATH"
- Download Rabix from here: https://github.com/rabix/bunny
- To download and extract Rabix executor on Ubuntu:
wget https://github.com/rabix/bunny/releases/download/v1.0.0-rc3/rabix-1.0.0-rc3.tar.gz && tar -xvf rabix-1.0.0-rc3.tar.gz
To download the CloudNeo repository
git clone https://github.com/TheJacksonLaboratory/CloudNeo.git
- If git is not installed, install it with
sudo apt-get install git-core
The Dockerfiles used to develop the workflow are provided in the folder 'dockerfiles'. All the Dockerfile use ubuntu:14.0 version. To build the docker image, run the commands shown below. It is assumed that you are in the CloudNeo directory.
Important Note: Because of Licensing requirements, we are not providing the Docker images by themselves. We have provided the "Dockerfiles" required to do build the images. Please follow the commands shown below to build the images.
Important Note: There is no direct link to download the netMHC and netMHCpan softwares. The softwares are emailed from the original authors after the following form is filled. You need to copy the software (Linux) into the netMHC or netMHCpan dockerfile directory to build the image.
Note: Before you build the image, you need to download the netMHCpan software and place it in the netMHCpan netMHCpan.v3.0a directory. Make sure you download the V.3.0a version and also check the Dockerfile file to see if the .tar.gz file name matches.This version uses Linux version to develop the Docker image.
Note: Before you build the image, you need to download the netMHC software and place it in the netMHCpan netMHCpan.v3.0a directory. Make sure you download the V.4.0a version and also check the Dockerfile file to see if the .tar.gz file name matches. This version uses Linux version to develop the Docker image.
# build bwa=0.7.13 image
docker build -t bwa:cloudneo -f dockerfiles/bwa.v0.7.13/Dockerfile .
# build hlaminer=1.3 image
docker build -t hlaminer:cloudneo dockerfiles/hlaminer/Dockerfile .
# build netmhcpan=3.0a image
## Important: before you build the image, you need to download the netMHCpan software and place it in the netMHCpan netMHCpan.v3.0a directory. Make sure you download the V.3.0a version and also check the Dockerfile file to see if the .tar.gz file name matches.
docker build -t netmhcpan:cloudneo dockerfiles/netMHCpan.v3.0a/Dockerfile .
# build netmhc=4.0a image
## Important: before you build the image, you need to download the netMHC software and place it in the netMHCpan netMHCpan.v3.0a directory. Make sure you download the V.4.0a version and also check the Dockerfile file to see if the .tar.gz file name matches. This version uses Linux version to develop the Docker image.
docker build -t netmhc:cloudneo dockerfiles/netMHC.v4.0a/Dockerfile .
# build polysolver image
docker build -t polysolver:cloudneo dockerfiles/polysolver/Dockerfile .
# build protein-translator image
docker build -t protein-translator:cloudneo dockerfiles/protein-translator/Dockerfile .
# build samtools=1.3 image
docker build -t samtools:cloudneo dockerfiles/samtools.v1.3/Dockerfile .
# build variant-effect-predictor=83 image
docker build -t variant-effect-predictor:cloudneo dockerfiles/variant-effect-predictor/Dockerfile .
# build vcf-parser image
docker build -t vcf-parser:cloudneo dockerfiles/vcf-parser/Dockerfile .
The cloudneo.cwl CWL file has already the above docker images names (<name>:cloudneo). You need not edit the CWL file. If you have given a different name ( the -t parameter value), then please edit the CWL code to the correct name. To search for the docker images in the CWL, use the following pattern: ":cloudneo"
Detailed information about the input specification is described in the manual.
Please refer the CGC manual for detailed explanation about the inputs.
To run this example, you need the following files
- netmhcpan-3.0.imgt.fasta
- sample.vcf (see VCF file format)
- homo_sapiens_vep_83_GRCh37.tar.gz
- Homo_sapiens.GRCh37.75.gtf
- HumanProteins.GRCh37.75.csv
- HLA-I_II_CDS.fasta
- sample.bam (not provided in the example folder)
A sample input specification file has been included in the example repository. Make sure the paths in the file point to the correct directory - directory where you have downloaded the reference files and input BAM and VCF files (see VCF file format). The BAM and VCF files have not been provided. Please refer to the Sample VCF file format guide.
To test the CWL with Rabix, we have included some example sample files in the github repo. Please see the directory test
in the repository. The inputs.json
json file has paths corresponding to the test
directory.
./rabix-backend-local-1.0.0-rc3/rabix cloudneo.cwl inputs.json
See example log after running Rabix on CloudNeo CWL.