Calling Deep Variant model on Human Genome - tuiaswath/karyosoft GitHub Wiki
Steps to run Deep Variant
1.Creating a Conda environment
conda create -n genomics
2.Installing necessary packages
conda install -c bioconda bedtools
conda install -c bioconda picard
conda install -c bioconda bwa
conda install -c bioconda gatk4
conda install -c bioconda snpeff
3.Downloading Human Reference Sequence
wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.fna.gz
3.1.Unzip GRCh38_latest_genomic.fna.gz
gzip -d GRCh38_latest_genomic.fna.gz
cp GGRCh38_latest_genomic.fna GRCh38.fa
3.2.Create Sequence Dictionary for reference file
samtools faidx GRCh38.fa
picard CreateSequenceDictionary R=GRCh38.fa O=GRCh38.dict
4.Human Reference Index Creation
bwa index -p grch38bwaidx -a bwtsw GRCh38_latest_genomic.fna
5.Downloading Sample Fastq from SRA Database
fasterq-dump --split-files SRR098401
6.Trimming Adapter and low-quality Sequence of Samples
mkdir -p variant_calling/trimmed_seq
trim_galore --paired SRR098401_1.fastq SRR098401_2.fastq --quality 30 --fastqc --length 30 --output_dir variant_calling/trimmed_seq/
7.Read group adding and Alignment to Reference
bwa mem -t 16 -M -R '@RG\tID:sample_1\tLB:sample_1\tPL:ILLUMINA\tPM:HISEQ\tSM:sample_1' grch38bwaidx
variant_calling/trimmed_seq/SRR098401_1_val_1.fq variant_calling/trimmed_seq/SRR098401_2_val_2.fq > SRR098401-aligned.sam
8.Converting Sam alignment to Bam and Sorting
samtools view -bS SRR098401-aligned.sam | samtools sort - SRR098401-sorted.bam
9.Mark Duplicates + Sort
picard MarkDuplicates \
I= SRR098401-sorted.bam.bam \
O= SRR098401-de_duplicates.bam\
M= SRR098401_dup_metrics.txt
10.Index Bam file
samtools index SRR098401-de_duplicates.bam
11.Deep variant command
BIN_VERSION="1.1.0"
sudo docker run
-v "/mnt/disks/snpvariant/deepvariant/input":"/input"
-v "/mnt/disks/snpvariant/deepvariant/output":"/output"
google/deepvariant:"${BIN_VERSION}" /opt/deepvariant/bin/run_deepvariant
--model_type=WGS
--ref=/input/GRCh38.fasta
--reads=/input/SRR098401.bam
--output_vcf=/output/output.vcf.gz
--output_gvcf=/output/output.g.vcf.gz
--intermediate_results_dir /output/intermediate_results_dir
--num_shards=12