Indel_Realigner - MorrellLAB/sequence_handling GitHub Wiki

Basic Usage

The Indel_Realigner handler uses the Genome Analysis Toolkit (GATK) v3.8 or earlier to realign reads in regions where there are insertions or deletions (indels). Realignment is not necessary for variant calling with the sequence_handling pipeline, but can be useful for exporting data to other pipelines such as ANGSD-wrapper. Indel_Realigner requires .intervals files generated by Realigner_Targets_Creator. Important note: GATK 4 no longer has indel realignment functionality, please use GATK v3.8 or earlier for this step.

To run Indel_Realigner, all common variables and handler-specific variables must be defined within the configuration file. Once the variables have been defined, Indel_Realigner can be submitted to a job scheduler with the following command (assuming that you are in the directory containing sequence_handling):

./sequence_handling Indel_Realigner Config_Indel_Realign

Where Config is the full file path to the configuration file.

Handler-Specific Variables

The following are a list of variables that need to be defined within Config_Indel_Realign. In addition to the handler-specific variables, all common variables must be defined.

Variable Function
IR_QSUB QSub settings for batch submission. Recommended settings are "mem=22gb,nodes=1:ppn=16,walltime=24:00:00".
IR_BAM_LIST A list of full file paths to the processed BAM files. This can be generated with sample_list_generator.sh.
IR_TARGETS The full file path to the list of .intervals files from Realigner_Target_Creator. This can be generated with sample_list_generator.sh.
LOD_THRESHOLD The LOD threshold above which the cleaner will clean. GATK default: 5.0, Barley: 3.0
ENTROPY_THRESHOLD The percentage of mismatches at a locus to be considered having high entropy (0.0 < entropy <= 1.0). GATK default: 0.15, Barley: 0.10

Note: The list of BAM files and the list of .intervals files must be in the same order to ensure proper realignment. If both lists are generated using sample_list_generator.sh then they will be in the same order.

Output

Indel_Realigner generates a realigned BAM file for each sample. The finished BAM files can be found at

${OUT_DIR}/Indel_Realigner

Dependencies

Indel_Realigner depends on the GATK v3.8 or earlier and Java. In addition, PBS is required for basic operation. Please check the dependencies page to ensure that you are using the required version of each dependency.

To learn more about using your finished BAM files to compute population genetics descriptive statistics without performing SNP calls, visit the ANGSD-wrapper Github repository.