Install and setup - mbosio85/ediva GitHub Wiki

Here we comment the requirements to install eDiVA locally, and how to configure eDiVA to work with the NextFlow pipeline management tool. If you are interested to use eDiVA with the provided Docker implementation, skip to the end of the page, or read the README.md in the Docker folder of the repository.

Local installation requirements

List of external tools

  • nextflow.
  • Samtools v1.3
  • Python 2.7
  • Bedtools 2.25
  • Fastqc 0.11.5
  • Picard 1.119
  • GATK 3.3+
  • Bwa 0.7.10
  • Tabix
  • Bcftools
  • R
    • Caret package
    • RandomForest packcage
  • dbNSP database split by snp and indels with
    • bcftools view -v snps dbsnp.vcf.gz | bcftools norm  -m - > snps.vcf
    • bcftools view -v indels dbsnp.vcf.gz | bcftools norm  -m - > indels.vcf
  • mysql-client

Python packages required:

  • numpy
  • scipy
  • pysam
  • drmaa
  • xlrd
  • xlsxwriter
  • MySQL-python
  • mysql-connector==2.1.4
  • biopython
  • python-tk [with apt-get]
  • networkx==1.11 [important]
  • pandas

NextFlow Configuration

Once Installed NextFlow, you can run eDiVA with simple one-liners, provided you configure properly NextFlow. To do so, you will need to edit the provided nextfow.config file, adding your own parameters.

The most basic setup is to run eDiVA locally, for this you simply need to adapt the paths so they will refer to the installed tools in your machine.

Here an example of the default nextflow.config file for local execution

process {
  executor='local'
  }

env {
    REF='/users/GD/resource/human/hg19/bwa7/hg19.fasta'
    DBINDEL='/users/GD/resource/human/hg19/databases/dbSNP/dbsnp_138.hg19.indels.vcf'
    DBSNP='/users/GD/resource/human/hg19/databases/dbSNP/dbsnp_138.hg19.snps.vcf'
    BWA='/users/GD/tools/bwa/bwa-0.7.10/bwa'
    EDIVA='/users/tools/ediva/edivatools-code/'
    GATK='/users/GD/tools/GATK/GenomeAnalysisTK-3.3/GenomeAnalysisTK.jar'
    PICARD='/users/GD/tools/picard/picard-tools-1.119/'
    SAMTOOLS='/users/GD/tools/samtools/samtools-1.3.1/samtools'
    BEDTOOLS='/users/GD/tools/bedtools/bedtools-latest/bin/'
    EXOME='exome_kit.bed'
    BEDTOOLS='/users/GD/tools/bedtools/bedtools-latest/bin/'
    FASTQC='/users/GD/tools/FastQC/FastQC-0.11.5/fastqc'
    PYTHON='/software/Python2.7/bin/python'
}

EXOME parameter is the bed file with the region of the genome covered by your kit. We normally extend the factory specifications by 150bp on each side, then we sort and merge the intervals.

**HPC environment execution configuration ** If you plan to run eDiVA in an HPC environment, please read NextFlow documentation about how to do it

A simple editing for an SGE environment is to change the process field above with the following:

    memory '2 GB'
    queue 'long'
    cpus 8
    executor 'sge'

If you plan to run each job with a different configuration, bear in mind this is possible to do by overriding the executor parameter when calling nextflow scripts.


Docker Configuration

eDiVA is also available with Docker, easy to port and implement. In the Docker folder there are 4 items:

  • A nexflow.config file :
    • This is a base to launch eDiVA containers with nextflow
    • Parts are commented and need to be edited depending on the tool of the pipeline you need to use
  • README.md
    • Instruction to build and run containers for eDiVA
  • eDiVA-code:
    • Folder with Dockerfile needed to build ediva:code image
    • This includes all external tools and eDiVA code to run pieces of the pipeline
  • eDiVA-DB:
    • Dockerfile to build the database image of eDiVA.
    • To properly build it you can to contact us to generate the .sql.gz files composing the database.

Step 1 eDiVA-Predict