FastCall 2 - PlantGeneticsLab/TIGER GitHub Wiki
Whole-genome shotgun sequencing (WGS) data are being generated rapidly for all species. High coverage sequencing data for thousands to hundreds of thousands of individuals are flooding into science communities, posing a real challenge to process and utilize the large data set efficiently. FastCall 2 is a superfast variant calling and genotyping software for WGS data, which was designed to replace the combination of FastCall and HapScanner. FastCall 2 is optimized for easy data management and high speed. It works for species with disomic inheritance, e.g. diploid species or allopolyploid species, including both inbreds and outcrossers.
FastCall 2 has 5 modules,
- disc. This module is used to discover genetic variations for each individual.
- blib. This module builds a genetic variation library based on the variations from all individuals.
- vlib. This module is to view the genetic variation library, converting the library file from binary format to human readable format. Module vlib is optional in FastCall 2.
- clib. This module is used to customize the genetic variation library by selecting a subset of variations from an existing library file. Module clib is optional in FastCall 2.
- scan. This module is used to genotype all individuals based on a provided genetic variation library.
Java 8
https://www.oracle.com/java/technologies/javase/javase8-archive-downloads.html
Samtools
Note: The current version of FastCall 2 was tested based on samtools-1.18. It is recommanded to install samtools-1.18. The results of FastCall 2 are not guaranteed if other version of samtools are used.
FastCall 2 has 5 modules, which is described in the overview.
For disc, the options of are listed below.
-app App name.
-mod Module name of FastCall 2.
-a Reference genome file with an index file (.fai). The reference should be in Fasta format. Chromosomes are labled as numbers (1,2,3,4,5...). It is recommanded to use reference chromosome while perform variation discovery for each chromosome because loading reference genome would be much faster.
-b The taxaBamMap file contains information of taxon and its corresponding bam files. The bam file should have .bai file in the same folder. The template of the taxaBamMap file is available from here.
-c The switch of base alignment quality (BAQ) computaiton, 0 is diabled and 1 is enbabled. It is 0 by default.
-d Minimum mapping quality (MQ) for an alignment to be used for variation calling. It is 30 by default.
-e Minimum base quality (BQ) for a base to be used for variation calling. It is 20 by default.
-f Minimum read depth count (MDC) for variation calling, meaning that sites with depth lower than the minimum will not be taken into account for variation discovery. It is 2 by default.
-g Minimum read depth ratio (MiDR) for variation calling, meaning that sites with depth lower than the MiDR of the individual sequencing coverage will not be considered for variation discovery. It is 0.2 by default.
-h Maximum read depth ratio (MaDR) for variation calling, meaning that sites with depth higher than the MaDR of the individual sequencing coverage will not be considered for variation discovery. It is 3 by default.
-i Homozygous ratio (HoR) for variation calling, meaning that the depth of alternative allele is greater than HoR are considered to homozygous. It is 0.8 by default.
-j Heterozygous ratio (HeR) for variation calling, meaning that the depth of alternative allele is greater than HeR and less than (1-HeR) are considered to be hets. It is 0.35 by default.
-k Third allele depth ratio (TDR) for variation calling. If the depth of the third allele is greater than TDR by the individual coverage, the site will be ignored. Otherwise, the third allele will be considered as sequencing error. It is 0.2 by default.
-l Chromosome or region on which genotyping will be performed (e.g. chromosome 1 is designated as 1. Region 1bp to 100000bp on chromosome 1 is 1:1,100000).
-m Number of threads (taxa number to be processed at the same time). It is 32 by default.
-n Individual genotype output directory.
-o The path of samtools.
Here is an example of running module disc of FastCall 2 from command line,
- java -Xmx100g -jar TIGER.jar -app FastCall2 -mod disc -a chr001.fa -b taxaBamMap.txt -c 0 -d 30 -e 20 -f 2 -g 0.2 -h 3 -i 0.8 -j 0.35 -k 0.2 -l 1 -m 32 -n /ing -o /usr/local/bin/samtools > log.txt &
For blib, the options of are listed below.
-app App name.
-mod Module name of FastCall 2.
-a Reference genome file with an index file (.fai). The reference should be in Fasta format. Chromosomes are labelled as numbers (1,2,3,4,5...).
-b Chromosome or region on which genotyping will be performed (e.g. chromosome 1 is designated as 1. Region 1bp to 100000bp on chromosome 1 is 1:1,100000).
-c Minor allele occurrence threshold, representing the minimum number of taxa where the minor allele exist. It is 2 by default.
-d Number of threads (taxa number to be processed at the same time). It is 32 by default.;
-e Individual genotype directory.
-f Variation library directory.
Here is an example of running module blib of FastCall 2 from command line,
- java -Xmx100g -jar TIGER.jar -app FastCall2 -mod blib -a chr001.fa -b 1 -c 2 -d 32 -e /ing -f /vLib > log.txt &
For vlib, the options of are listed below.
-app App name.
-mod Module name of FastCall 2.
-a The input genetic variation library file in binary format.
-b The output genetic variation library file in text format.
Here is an example of running module vlib of FastCall 2 from command line,
- java -Xmx100g -jar TIGER.jar -app FastCall2 -mod blib -a input.lib.gz -b output.lib.txt > log.txt &
For clib, the options of are listed below.
-app App name.
-mod Module name of FastCall 2.
-a The input genetic variation library file in binary format.
-b The user provided file with custom positions. The template of the custom position file is available from here.
-c The output custom genetic variation library file.
Here is an example of running module clib of FastCall 2 from command line,
- java -Xmx100g -jar TIGER.jar -app FastCall2 -mod clib -a input.lib.gz -b custom_positions.txt -c output_subset.lib.gz > log.txt &
For scan, the options of are listed below.
-app App name.
-mod Module name of FastCall 2.
-a Reference genome file with an index file (.fai). The reference should be in Fasta format. Chromosomes are labelled as numbers (1,2,3,4,5...). It is recommended to use reference chromosome while perform genotyping for each chromosome because loading reference genome would be much faster.
-b The taxaBamMap file contains information of taxon and its corresponding bam files. The bam file should have .bai file in the same folder. The template of the taxaBamMap file is available from here.
-c The genetic variation library file, which is from step 2.
-d Chromosome or region on which genotyping will be performed (e.g. chromosome 1 is designated as 1, Region 1bp to 100000bp on chromosome 1 is 1:1,100000).
-e The switch of base alignment quality (BAQ) computaiton, 0 is diabled and 1 is enbabled. It is 0 by default.
-f Minimum mapping quality (MQ) for an alignment to be used for genotyping. It is 30 by default.
-g Minimum base quality (BQ) for a base to be used for genotyping. It is 20 by default.
-h Combined error rate of sequencing and misalignment. Heterozygous read mapping are more likely to be genotyped as homozygote when the combined error rate is high.
-i The path of samtools.
-j Number of threads. It is 32 by default.
-k The directory of VCF output.
Here is an example of running module scan of FastCall 2 from command line,
- java -Xmx100g -jar TIGER.jar -app FastCall2 -mod scan -a chr001.fa -b taxaBamMap.txt -c /vLib/1.lib.gz -d 1 -e 0 -f 30 -g 20 -h 0.05 -i /usr/local/bin/samtools -j 32 -k /gen > log.txt &
Fei Lu
[email protected]; [email protected]
https://plantgeneticslab.github.io/home/
Coming soon...