HapScanner - PlantGeneticsLab/TIGER GitHub Wiki
Overview
Superfast genotyper for whole-genome shotgun (WGS) sequencing data, based on an existing genetic variation library. It works for species with disomic inheritance, e.g. diploid species or allopolyploid species, including both inbreds and outcrossers.
Prerequisites
Java 8
http://www.oracle.com/technetwork/java/javase/overview/java8-2100321.html
Samtools
http://samtools.sourceforge.net/
Usage
From command line,
- java -Xmx100g -jar TIGER.jar -a HapScanner -p ./parameter_hapscanner.txt > log.txt &
The content of the parameter file is as follows.
@App: HapScanner
@Author: Fei Lu
@Email: [email protected]; [email protected]
@Homepage: https://plantgeneticslab.weebly.com/
#HapScanner is used to perform genotyping of diploid species from whole genome sequenceing data, based on an existing genetic variation library.
#To run and pipeline, the machine should have both Java 8 and samtools installed. The lib directory should stay with TIGER.jar in the same folder.
#Command line example. java -Xmx100g -jar TIGER.jar -a HapScanner -p parameter_hapscanner.txt > log.txt &
#To specify options, please edit the the parameters below. Also, please keep the order of parameters.
#Parameter 1: The taxaRefBam file containing information of taxon and its corresponding refernece genome and bam files. The bam file should have .bai file in the same folder
#If one taxon has n bam files, please list them in n rows.
/Users/feilu/Documents/analysisL/softwareTest/pgl/hapScanner/inputfile/taxaRefBAM_hapscanner.txt
#Parameter 2: The posAllele file (with header), the format is Chr\tPos\tRef\tAlt (from VCF format). The positions come from genetic variation library.
#A maximum of 2 alternative alleles are supported, which is seperated by ",", e.g. A,C.
#Deletion and insertion are supported, denoted as "D" and "I".
/Users/feilu/Documents/analysisL/softwareTest/pgl/hapScanner/inputfile/posAllele_hapscanner.txt
#Parameter 3: The pos files (without header), the format is Chr\tPos. The positions come from haplotype library, which is used in mpileup.
/Users/feilu/Documents/analysisL/softwareTest/pgl/hapScanner/inputfile/pos_hapscanner.txt
#Parameter 4: The chromosome which will be scanned.
1
#Parameter 5: Combined error rate of sequencing and misalignment. Heterozygous read mapping are more likely to be genotyped as homozygote when the combined error rate is high.
0.05
#Parameter 6: The path of samtools
/usr/local/bin/samtools
#Parameter 7: Number of threads
16
#Parameter 8: The directory of output
/Users/feilu/Documents/analysisL/softwareTest/pgl/hapScanner/out
The parameter file is available from here.
Three additional files are referenced in the parameter file. They are
- taxaRefBAM_hapscanner.txt
This file contains taxa names, their corresponding reference genomes, and bam files. The format is as follows.
Taxa Reference BamPath
TW0060 /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/ref/chr001_1Mb.fa /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/bams/TW0060.sub.bam
TW0061 /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/ref/chr001_1Mb.fa /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/bams/TW0061.sub.bam
TW0062 /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/ref/chr001_1Mb.fa /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/bams/TW0062.sub.bam
TW0063 /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/ref/chr001_1Mb.fa /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/bams/TW0063.sub.bam
TW0064 /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/ref/chr001_1Mb.fa /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/bams/TW0064.sub.bam
TW0065 /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/ref/chr001_1Mb.fa /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/bams/TW0065.sub.bam
TW0066 /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/ref/chr001_1Mb.fa /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/bams/TW0066.sub.bam
TW0067 /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/ref/chr001_1Mb.fa /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/bams/TW0067.sub.bam
TW0068 /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/ref/chr001_1Mb.fa /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/bams/TW0068.sub.bam
TW0069 /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/ref/chr001_1Mb.fa /Users/feilu/Documents/analysisL/softwareTest/pgl/fastCall/bams/TW0069.sub.bam
This file is available from here.
- posAllele_hapscanner.txt
This file contains information of genetic variation library. The format is as follows.
Chr Pos Ref Alt(maximum 2 alternative alleles, which is seperated by ",", e.g. A,C)
1 7 C G
1 12 A D
1 33 A D
1 37 C T
1 38 T A
1 48 C T
1 56 A T
1 364 G A
1 492 T C
1 661 G T
This file is available from here.
- pos_hapscanner.txt
This file contains position information for mpileup of samtools. The format is as follows.
1 7
1 12
1 33
1 37
1 38
1 48
1 56
1 364
1 492
1 661
This file is available from here.
Author
Fei Lu
[email protected]; [email protected]
http://plantgeneticslab.weebly.com/