GATK - k821209/pipelines GitHub Wiki
required
- http://broadinstitute.github.io/picard/
- GATK # https://www.amazon.com/clouddrive/folder/TMkWwhkPSi6fsuk2wI8l4Q
- samtools
- sambamba (for fast sort)
apt-get install littler # for Rscript
add read group
java -jar /program/picard-tools-2.1.1/picard.jar AddOrReplaceReadGroups INPUT=Pepper_1.55_R_reference_genome_R.taeahn.sorted.bam OUTPUT=Pepper_1.55_R_reference_genome_R.taeahn.sorted.bam.rg.bam LB=taeahn PL=Illumina PU=NA SM=taeahn VALIDATION_STRINGENCY=LENIENT
full pipe
https://www.biostars.org/p/8237/
#Create sequence dictionary
java -jar~/bin/picard-tools-1.8.5/CreateSequenceDictionary.jar REFERENCE=reference.fasta OUTPUT=reference.dict
#Align reads and assign read group
bwa mem -R β@RG\tID:FLOWCELL1.LANE1\tPL:ILLUMINA\tLB:test\tSM:PA01β reference.fasta R1.fastq.gz R2.fastq.gz > aln.sam
#Sort sam file
java -jar ~/bin/picard-tools-1.8.5/SortSam.jar I=aln.sam O=sorted.bam SORT_ORDER=coordinate
#Mark duplicates
java -jar ~/bin/picard-tools-version/MarkDuplicates.jar I=sorted.bam O=dedup.bam METRICS_FILE=metrics.txt
#Sort bam file
java -jar ~/bin/picard-tools-version/BuildBamIndex.jar INPUT=dedup.bam
#Create realignment targets
java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T RealignerTargetCreator -R reference.fasta -I dedup.bam -o targetintervals.list
#Indel realignment
java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T IndelRealigner -R PA01.fasta -I dedup.bam -targetIntervals targetintervals.list -o realigned.bam
#Call variants (HaplotypeCaller)
java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T HaplotypeCaller -R reference.fasta -I realigned.bam -ploidy 1 -stand_call_conf 30 -stand_emit_conf 10 -o raw.vcf
The resulting vcf file will contain your variant calls!
Then you can optionally filter the variants:
#Filter variants
~/bin/vcflib/bin/vcffilter -f βDP > 9β -f βQUAL > 10β raw.vcf > filtered.vcf
Or first split the raw.vcf file into SNPs and indels:
#Extract SNPs
java -jar ~/bin/GATK3.3/GenomeAnalysisTK.jar -T SelectVariants -R reference.fasta -V raw.vcf -selectType SNP -o snps.vcf
#Extract Indels
java -jar ~/bin/GATK/GenomeAnalysisTK.jar -T SelectVariants -R reference.fasta -V raw.vcf -selectType INDEL -o indels.vcf