Indel_Realigner - MorrellLAB/sequence_handling GitHub Wiki
Basic Usage
The Indel_Realigner handler uses the Genome Analysis Toolkit (GATK) v3.8 or earlier to realign reads in regions where there are insertions or deletions (indels). Realignment is not necessary for variant calling with the sequence_handling pipeline, but can be useful for exporting data to other pipelines such as ANGSD-wrapper. Indel_Realigner requires .intervals files generated by Realigner_Targets_Creator. Important note: GATK 4 no longer has indel realignment functionality, please use GATK v3.8 or earlier for this step.
To run Indel_Realigner, all common variables and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Indel_Realigner can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling
):
./sequence_handling Indel_Realigner Config_Indel_Realign
Where Config is the full file path to the configuration file.
Handler-Specific Variables
The following are a list of variables that need to be defined within Config_Indel_Realign
. In addition to the handler-specific variables, all common variables must be defined.
Variable | Function |
---|---|
IR_QSUB |
QSub settings for batch submission. Recommended settings are "mem=22gb,nodes=1:ppn=16,walltime=24:00:00" . |
IR_BAM_LIST |
A list of full file paths to the processed BAM files. This can be generated with sample_list_generator.sh . |
IR_TARGETS |
The full file path to the list of .intervals files from Realigner_Target_Creator. This can be generated with sample_list_generator.sh . |
LOD_THRESHOLD |
The LOD threshold above which the cleaner will clean. GATK default: 5.0, Barley: 3.0 |
ENTROPY_THRESHOLD |
The percentage of mismatches at a locus to be considered having high entropy (0.0 < entropy <= 1.0). GATK default: 0.15, Barley: 0.10 |
Note: The list of BAM files and the list of .intervals files must be in the same order to ensure proper realignment. If both lists are generated using sample_list_generator.sh
then they will be in the same order.
Output
Indel_Realigner generates a realigned BAM file for each sample. The finished BAM files can be found at
${OUT_DIR}/Indel_Realigner
Dependencies
Indel_Realigner depends on the GATK v3.8 or earlier and Java. In addition, PBS is required for basic operation. Please check the dependencies page to ensure that you are using the required version of each dependency.