Install Deep Variant - tuiaswath/karyosoft GitHub Wiki

DeepVariant Installation

Machine Configuration

AWS Tier : C5.4XLarge
Platform : Ubuntu 20.04 (Linux)
Root Partition : 60GB
SSD : 400GB
RAM : 32GB

1. Check the space on the disk and root partitions

sudo lsblk
  • Drive:/mnt/disks/
  • It should have 400GB or more

2. Update the local repository sources to get the latest versions of the packages needed to install later

sudo apt-get update 

3. Check if Python 3 is installed

whereis python3 

3.1. If its installed, clear a symlink to /usr/bin/python

sudo ln -s /usr/bin/python3 /usr/bin/python

3.2. If its not installed, install it using APT

sudo apt-get install python3 

4. Install Docker using APT

sudo apt install docker.io 

5. Start the docker service and enable auto-start on reboot

sudo systemctl start docker 
sudo systemctl enable docker 

5.1. Check Docker version

docker --version

6. Install Docker-Compose using APT.

6.1. Install curl if it is not yet installed

sudo apt-get update 
sudo apt-get curl 

6.2. Download docker-compose into the system Path location

sudo curl -L "https://github.com/docker/compose/releases/download/1.25.5/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose 

6.3. Add execution permission to the docker-compose executable and change owner to the local user

sudo usermod -aG docker ubuntu 
sudo chown ubuntu:ubuntu /var/run/docker.sock 

6.4. Check the docker version

docker-compose --version

Changing the location of Docker's Data directory

7. Stop Docker Service

sudo systemctl stop docker

8. Verify if above command succeeded. The following command will yield no output only if Docker service is stopped:

sudo ps aux | grep -i docker | grep -v grep

9. Create this file /etc/docker/daemon.json and write these into:

{ 
  "graph": "/mnt/disks/docker/" 
}

10. Copy Existing Docker Data

sudo rsync -aP /var/lib/docker/ /mnt/disks/docker/

11. Verification

11.1. Rename the original data directory

sudo mv /var/lib/docker /var/lib/docker.orig

11.2. Start our docker service

sudo systemctl start docker

11.3. Verify with docker info

docker info | grep -i root
  • You should see something like this

    Docker Root Dir: /mnt/disks/docker/
    
  • Once you have verified the data directory has moved and it reflects the same in the docker info, you may proceed to remove the original directory.

11.4. Remove original docker data directory

sudo rm -fr /var/lib/docker.orig

12. Pull docker Image (Optional)

docker pull google/deepvariant:1.1.0
  • This step is optional because in the "Deep Variant Command for Variant Calling" step when we run the deep variant command,the docker image of the google deep variant will be pulled automatically.

Steps to run Deep Variant

13. Creating a Conda environment

  conda create -n genomics 

14. Installing necessary packages

  conda install -c bioconda bedtools 
  conda install -c bioconda picard 
  conda install -c bioconda bwa 
  conda install -c bioconda gatk4 
  conda install -c bioconda snpeff 

15. Downloading Human Reference Sequence

  wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.fna.gz 

16. Unzip GRCh38_latest_genomic.fna.gz

  gzip -d GRCh38_latest_genomic.fna.gz 
  cp GGRCh38_latest_genomic.fna GRCh38.fa 

17. Create Sequence Dictionary for reference file

  samtools faidx GRCh38.fa
  picard CreateSequenceDictionary R=GRCh38.fa O=GRCh38.dict

18. Human Reference Index Creation

  bwa index -p grch38bwaidx -a bwtsw GRCh38_latest_genomic.fna

19. Downloading Sample Fastq from SRA Database

  fasterq-dump --split-files SRR098401 

20. Trimming Adapter and low-quality Sequence of Samples

  mkdir -p variant_calling/trimmed_seq
  trim_galore --paired SRR098401_1.fastq SRR098401_2.fastq --quality 30 --fastqc --length 30 --output_dir variant_calling/trimmed_seq/

21. Read group adding and Alignment to Reference

 bwa mem -t 16 -M -R '@RG\tID:sample_1\tLB:sample_1\tPL:ILLUMINA\tPM:HISEQ\tSM:sample_1' grch38bwaidx 
 variant_calling/trimmed_seq/SRR098401_1_val_1.fq variant_calling/trimmed_seq/SRR098401_2_val_2.fq > SRR098401-aligned.sam 

22. Converting Sam alignment to Bam and Sorting

 samtools view -bS SRR098401-aligned.sam | samtools sort - SRR098401-sorted.bam 

23. Mark Duplicates + Sort

 picard MarkDuplicates \ 
 I= SRR098401-sorted.bam.bam \ 
 O= SRR098401-de_duplicates.bam\ 
 M= SRR098401_dup_metrics.txt 

24. Index Bam file

 samtools index SRR098401-de_duplicates.bam 

25. Deep Variant Command for Variant Calling

BIN_VERSION="1.1.0" 
sudo docker run  
-v "/mnt/disks/snpvariant/deepvariant/input":"/input"  
-v "/mnt/disks/snpvariant/deepvariant/output":"/output"  
google/deepvariant:"${BIN_VERSION}" /opt/deepvariant/bin/run_deepvariant  
--model_type=WGS  
--ref=/input/GRCh38.fasta  
--reads=/input/SRR098401.bam  
--output_vcf=/output/output.vcf.gz  
--output_gvcf=/output/output.g.vcf.gz  
--intermediate_results_dir /output/intermediate_results_dir  
--num_shards=12 

26. Build snpEff database

26.1. Download Gff3 file

  wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.gff.gz

26.2. Unzip file and change as genes.gff file

  gzip -d GRCh38_latest_genomic.gff.gz
  cp GRCh38_latest_genomic.gff genes.gff

26.3. Change Reference fa file name as sequences.fa

  cp GRCh38.fa sequences.fa

26.4. Build DB for Annotation

  • Note : Get snpEff.config from conda environment or from git hub

  • Note : Data location can be changed in snpEff.config

    scp -r <Local Path to snpEff.config file>\snpEff.config [email protected]:/mnt/disks/snpvariant
    
    mkdir -p deepvariant/mygenome
    cd deepvariant/mygenome
    cp genes.gff deepvariant/mygenome
    cp sequences.fa deepvariant/mygenome
    cp snpEff.config deepvariant
    
    snpEff build -Xmx32G -c snpEff.config -gff3 -v mygenome
    

27. Annotate and Predict Effects

    snpEff -Xmx32G -v mygenome output.vcf > final_annotated_output.vcf