Creating a flow file - mjsull/HapFlow GitHub Wiki
Required Input
To create a Flow File an Indexed BAM file, a VCF file and a location for the Flow file to be created must be selected.
If multiple references/chromosomes are present in the BAM file you will be prompted to select a reference. This choice can be changed by clicking the button (...) to the right of "Select reference:"
If using the command-line a flow file for each reference will be created unless otherwise specified.
Optional Arguments
Filter variants before:
Do not create flows at variant sites before this position. This is useful for quick analysis of a small section of the genome.
Filter variants after:
Do not create flows at variant sites after this position.
Max. Distance:
This is the maximum distance between two variant sites for HapFlow to check if they are part of the same Flow. This is done to keep the memory imprint of HapFlow-generator small. If you are using long reads (i.e. PacBio) change this value to the maximum read length.
In the authors experience chimerism is too high in mate-pair reads for useful information to be extracted - however if you wish to try set this value to the maximum insert size of your reads.
Min. variant quality:
Filter all variants with a quality score lower than this. In our experience variants with a quality < 10 are most likely due to sequencing error and add unnecessary "noise" to the flow diagram.
Filter high coverage variants:
Filter all variants with a coverage greater than this value multiplied by the median coverage of all variants. High coverage variants are likely due to repetitive regions or contamination, because rows in HapFlow are spaced according to the allele with the highest coverage for the entire genome reducing alleles with extremely high coverage removes a lot of unnecessary whitespace.