Calling Deep Variant model on Human Genome - tuiaswath/karyosoft GitHub Wiki

Steps to run Deep Variant

1.Creating a Conda environment

  conda create -n genomics 

2.Installing necessary packages

  conda install -c bioconda bedtools 
  conda install -c bioconda picard 
  conda install -c bioconda bwa 
  conda install -c bioconda gatk4 
  conda install -c bioconda snpeff 

3.Downloading Human Reference Sequence

  wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.fna.gz 

3.1.Unzip GRCh38_latest_genomic.fna.gz

  gzip -d GRCh38_latest_genomic.fna.gz 
  cp GGRCh38_latest_genomic.fna GRCh38.fa 

3.2.Create Sequence Dictionary for reference file

  samtools faidx GRCh38.fa
  picard CreateSequenceDictionary R=GRCh38.fa O=GRCh38.dict

4.Human Reference Index Creation

  bwa index -p grch38bwaidx -a bwtsw GRCh38_latest_genomic.fna

5.Downloading Sample Fastq from SRA Database

  fasterq-dump --split-files SRR098401 

6.Trimming Adapter and low-quality Sequence of Samples

  mkdir -p variant_calling/trimmed_seq
  trim_galore --paired SRR098401_1.fastq SRR098401_2.fastq --quality 30 --fastqc --length 30 --output_dir variant_calling/trimmed_seq/

7.Read group adding and Alignment to Reference

 bwa mem -t 16 -M -R '@RG\tID:sample_1\tLB:sample_1\tPL:ILLUMINA\tPM:HISEQ\tSM:sample_1' grch38bwaidx 
 variant_calling/trimmed_seq/SRR098401_1_val_1.fq variant_calling/trimmed_seq/SRR098401_2_val_2.fq > SRR098401-aligned.sam 

8.Converting Sam alignment to Bam and Sorting

 samtools view -bS SRR098401-aligned.sam | samtools sort - SRR098401-sorted.bam 

9.Mark Duplicates + Sort

 picard MarkDuplicates \ 
 I= SRR098401-sorted.bam.bam \ 
 O= SRR098401-de_duplicates.bam\ 
 M= SRR098401_dup_metrics.txt 

10.Index Bam file

 samtools index SRR098401-de_duplicates.bam 

11.Deep variant command

 BIN_VERSION="1.1.0" 
 sudo docker run  
 -v "/mnt/disks/snpvariant/deepvariant/input":"/input"  
 -v "/mnt/disks/snpvariant/deepvariant/output":"/output"  
 google/deepvariant:"${BIN_VERSION}" /opt/deepvariant/bin/run_deepvariant  
 --model_type=WGS  
 --ref=/input/GRCh38.fasta  
 --reads=/input/SRR098401.bam  
 --output_vcf=/output/output.vcf.gz  
 --output_gvcf=/output/output.g.vcf.gz  
 --intermediate_results_dir /output/intermediate_results_dir  
 --num_shards=12