Haplotype_Caller - MorrellLAB/sequence_handling GitHub Wiki
Basic Usage
The Haplotype_Caller handler uses the Genome Analysis Toolkit (GATK) to create a genomic variant call format (GVCF) file for each sample. This script requires a list of BAM files and the nucleotide diversity per base pair (Watterson's theta) as input. Due to the large amount of memory required, it is recommended to submit the task array to the "ram256g" queue on MSI.
To run Haplotype_Caller, all common variables and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Haplotype_Caller can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling
):
./sequence_handling Haplotype_Caller Config
Where Config is the full file path to the configuration file.
Handler-Specific Variables
The following are a list of variables that need to be defined within Config
. In addition to the handler-specific variables, all common variables must be defined.
Variable | Function |
---|---|
HC_QSUB |
QSub settings for batch submission. Recommended settings are "mem=250gb,nodes=1:ppn=24,walltime=24:00:00" . |
HC_QUEUE |
The specific queue where the job will be submitted. Attempting to run sequence_handling while on a different server than the one specified will create an error message. Choose from: "lab" , "mesabi" , "ram256g" , or other queues shown here. Recommended queue is "ram256g" . |
FINISHED_BAM_LIST |
A list of full file paths to the finished BAM files. This can be generated with sample_list_generator.sh . |
THETA |
The nucleotide diversity per base pair (Watterson's theta). This varies per species. For barley: 0.008 For soybean: 0.001 |
DO_NOT_TRIM_ACTIVE_REGIONS |
If true, GATK will not trim down the active region from the full region (active + extension) to just the active interval for genotyping. Recommended value: false. |
FORCE_ACTIVE |
If true, all bases will be considered active regions. Recommended value: false. |
Output
Haplotype_Caller generates a GVCF file for each BAM file specified. The GVCF files can be found at
${OUT_DIR}/Haplotype_Caller
A list of files is not generated from Haplotype_Caller. However, you can generate one using sample_list_generator.sh
.
Dependencies
Haplotype_Caller depends on GATK for generating the GVCFs. If the reference dictionary needs to be generated, Haplotype_Caller also depends on Picard. In addition, PBS is required for basic operation. Please check the dependencies page to ensure that you are using the required version of each dependency.