Verticall matrix - rrwick/Verticall GitHub Wiki
The verticall matrix
command is part of the distance tree workflow (see that page for example commands). It takes the TSV file made by verticall pairwise
and produces a PHYLIP distance matrix.
Key options
One of its most important options is --distance_type
which specifies which distance in the TSV file will be used in the output matrix. See the Columns in pairwise TSV file page for descriptions of each distance, but here are the ones you're most likely to want:
median_vertical_window
: this is the median value of the vertically-painted part of the sliding-window distance distribution (see Pairwise assembly comparison for details). This is the default because it helps ignore recombination in two ways. First, it ignores the horizontally-painted part of the distance distribution. Second, the median is a robust statistic, so even if Verticall failed to identify some horizontally transmitted part of the genome, the median distance shouldn't change very much.mean_vertical
: this is computed from the vertically-painted parts of the alignments by taking one minus the number of matching bases over the alignment length (i.e. one minus identity). Since it's taken from the alignments (not the sliding-window distance distribution), it's a more literal measure of genomic distance, but since it's a mean (not a median), it's less robust than the default.mean
: this is computed from all alignments by taking one minus the number of matching bases over the alignment length (i.e. one minus identity). This distance does not filter out horizontally-transmitted parts of the genome, and so it provides similar information to other genomic distance tools such as FastANI and Mash.
If your dataset has quite a lot of recombination, then the --multi
option might also be very important. See the Primary vs secondary results page for more information.
Full help output
usage: verticall matrix -i IN_FILE -o OUT_FILE
[--distance_type {mean,mean_window,median_window,peak_window,
mean_vertical_window,median_vertical_window,mean_vertical}]
[--asymmetrical] [--no_jukes_cantor] [--multi {first,exclude,low,high}]
[--include_names INCLUDE_NAMES] [--exclude_names EXCLUDE_NAMES] [-h]
[--version]
produce a PHYLIP distance matrix
Required arguments:
-i IN_FILE, --in_file IN_FILE Filename of TSV created by vertical pairwise
-o OUT_FILE, --out_file OUT_FILE
Filename of PHYLIP matrix output
Settings:
--distance_type {mean,mean_window,median_window,peak_window,
mean_vertical_window,median_vertical_window,mean_vertical}
Which distance to use in matrix (default: median_vertical_window)
--asymmetrical Do not average pairs to make symmetrical matrices (default: make
matrices symmetrical)
--no_jukes_cantor Do not apply Jukes-Cantor correction (default: apply Jukes-Cantor
correction)
--multi {first,exclude,low,high}
Behaviour when there are multiple results for a sample pair
(default: first)
--include_names INCLUDE_NAMES Samples names to include in matrix (comma-delimited, default:
include all samples)
--exclude_names EXCLUDE_NAMES Samples names to exclude from matrix (comma-delimited, default: do
not exclude any samples)
Other:
-h, --help Show this help message and exit
--version Show program's version number and exit