Install Deep Variant - tuiaswath/karyosoft GitHub Wiki
DeepVariant Installation
Machine Configuration
AWS Tier : C5.4XLarge
Platform : Ubuntu 20.04 (Linux)
Root Partition : 60GB
SSD : 400GB
RAM : 32GB
1. Check the space on the disk and root partitions
sudo lsblk
- Drive:/mnt/disks/
- It should have 400GB or more
2. Update the local repository sources to get the latest versions of the packages needed to install later
sudo apt-get update
3. Check if Python 3 is installed
whereis python3
3.1. If its installed, clear a symlink to /usr/bin/python
sudo ln -s /usr/bin/python3 /usr/bin/python
3.2. If its not installed, install it using APT
sudo apt-get install python3
4. Install Docker using APT
sudo apt install docker.io
5. Start the docker service and enable auto-start on reboot
sudo systemctl start docker
sudo systemctl enable docker
5.1. Check Docker version
docker --version
6. Install Docker-Compose using APT.
6.1. Install curl if it is not yet installed
sudo apt-get update
sudo apt-get curl
6.2. Download docker-compose into the system Path location
sudo curl -L "https://github.com/docker/compose/releases/download/1.25.5/docker-compose-$(uname -s)-$(uname -m)" -o /usr/local/bin/docker-compose
6.3. Add execution permission to the docker-compose executable and change owner to the local user
sudo usermod -aG docker ubuntu
sudo chown ubuntu:ubuntu /var/run/docker.sock
6.4. Check the docker version
docker-compose --version
Changing the location of Docker's Data directory
7. Stop Docker Service
sudo systemctl stop docker
8. Verify if above command succeeded. The following command will yield no output only if Docker service is stopped:
sudo ps aux | grep -i docker | grep -v grep
9. Create this file /etc/docker/daemon.json and write these into:
{
"graph": "/mnt/disks/docker/"
}
10. Copy Existing Docker Data
sudo rsync -aP /var/lib/docker/ /mnt/disks/docker/
11. Verification
11.1. Rename the original data directory
sudo mv /var/lib/docker /var/lib/docker.orig
11.2. Start our docker service
sudo systemctl start docker
11.3. Verify with docker info
docker info | grep -i root
-
You should see something like this
Docker Root Dir: /mnt/disks/docker/
-
Once you have verified the data directory has moved and it reflects the same in the docker info, you may proceed to remove the original directory.
11.4. Remove original docker data directory
sudo rm -fr /var/lib/docker.orig
12. Pull docker Image (Optional)
docker pull google/deepvariant:1.1.0
- This step is optional because in the "Deep Variant Command for Variant Calling" step when we run the deep variant command,the docker image of the google deep variant will be pulled automatically.
Steps to run Deep Variant
13. Creating a Conda environment
conda create -n genomics
14. Installing necessary packages
conda install -c bioconda bedtools
conda install -c bioconda picard
conda install -c bioconda bwa
conda install -c bioconda gatk4
conda install -c bioconda snpeff
15. Downloading Human Reference Sequence
wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.fna.gz
16. Unzip GRCh38_latest_genomic.fna.gz
gzip -d GRCh38_latest_genomic.fna.gz
cp GGRCh38_latest_genomic.fna GRCh38.fa
17. Create Sequence Dictionary for reference file
samtools faidx GRCh38.fa
picard CreateSequenceDictionary R=GRCh38.fa O=GRCh38.dict
18. Human Reference Index Creation
bwa index -p grch38bwaidx -a bwtsw GRCh38_latest_genomic.fna
19. Downloading Sample Fastq from SRA Database
fasterq-dump --split-files SRR098401
20. Trimming Adapter and low-quality Sequence of Samples
mkdir -p variant_calling/trimmed_seq
trim_galore --paired SRR098401_1.fastq SRR098401_2.fastq --quality 30 --fastqc --length 30 --output_dir variant_calling/trimmed_seq/
21. Read group adding and Alignment to Reference
bwa mem -t 16 -M -R '@RG\tID:sample_1\tLB:sample_1\tPL:ILLUMINA\tPM:HISEQ\tSM:sample_1' grch38bwaidx
variant_calling/trimmed_seq/SRR098401_1_val_1.fq variant_calling/trimmed_seq/SRR098401_2_val_2.fq > SRR098401-aligned.sam
22. Converting Sam alignment to Bam and Sorting
samtools view -bS SRR098401-aligned.sam | samtools sort - SRR098401-sorted.bam
23. Mark Duplicates + Sort
picard MarkDuplicates \
I= SRR098401-sorted.bam.bam \
O= SRR098401-de_duplicates.bam\
M= SRR098401_dup_metrics.txt
24. Index Bam file
samtools index SRR098401-de_duplicates.bam
25. Deep Variant Command for Variant Calling
BIN_VERSION="1.1.0"
sudo docker run
-v "/mnt/disks/snpvariant/deepvariant/input":"/input"
-v "/mnt/disks/snpvariant/deepvariant/output":"/output"
google/deepvariant:"${BIN_VERSION}" /opt/deepvariant/bin/run_deepvariant
--model_type=WGS
--ref=/input/GRCh38.fasta
--reads=/input/SRR098401.bam
--output_vcf=/output/output.vcf.gz
--output_gvcf=/output/output.g.vcf.gz
--intermediate_results_dir /output/intermediate_results_dir
--num_shards=12
26. Build snpEff database
26.1. Download Gff3 file
wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.gff.gz
26.2. Unzip file and change as genes.gff file
gzip -d GRCh38_latest_genomic.gff.gz
cp GRCh38_latest_genomic.gff genes.gff
26.3. Change Reference fa file name as sequences.fa
cp GRCh38.fa sequences.fa
26.4. Build DB for Annotation
-
Note : Get snpEff.config from conda environment or from git hub
-
Note : Data location can be changed in snpEff.config
scp -r <Local Path to snpEff.config file>\snpEff.config [email protected]:/mnt/disks/snpvariant mkdir -p deepvariant/mygenome cd deepvariant/mygenome cp genes.gff deepvariant/mygenome cp sequences.fa deepvariant/mygenome cp snpEff.config deepvariant snpEff build -Xmx32G -c snpEff.config -gff3 -v mygenome
27. Annotate and Predict Effects
snpEff -Xmx32G -v mygenome output.vcf > final_annotated_output.vcf