Subcommand: krd - lczech/gappa GitHub Wiki
Calculate the pairwise Kantorovich-Rubinstein (KR) distance matrix between samples.
Usage: gappa analyze krd [options]
| Input | |
|---|---|
--jplace-path |
Required. TEXT:PATH(existing)=[] ...List of jplace files or directories to process. For directories, only files with the extension .jplace[.gz] are processed. |
| Settings | |
--exponent |
FLOAT=1Exponent for KR integration. |
--normalize |
FLAGDivide the KR distance by the tree length to get normalized values. |
--point-mass |
FLAGTreat every pquery as a point mass concentrated on the highest-weight placement. In other words, ignore all but the most likely placement location (the one with the highest LWR), and set its LWR to 1.0. |
--ignore-multiplicities |
FLAGSet the multiplicity of each pquery to 1.0. For phylogenetic placement, the multiplicity is the equivalent of read abundances. This flag hence ignores the read abundances, treating each pquery as a singleton. |
| Matrix Output | |
--out-dir |
TEXT=.Directory to write output files to. |
--file-prefix |
TEXTFile prefix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data. |
--file-suffix |
TEXTFile suffix for output files. Most gappa commands use the command name as the base name for file output. This option amends the base name, to distinguish runs with different data. |
--compress |
FLAGIf set, compress the output files using gzip. Output file extensions are automatically extended by .gz. |
--matrix-format |
TEXT:{list,matrix,triangular}=matrixFormat of the output matrix file. |
--omit-matrix-labels |
FLAGIf set, the output matrix is written without column and row labels. |
| Global Options | |
--allow-file-overwriting |
FLAGAllow to overwrite existing output files instead of aborting the command. |
--verbose |
FLAGProduce more verbose output. |
--threads |
UINTNumber of threads to use for calculations. |
--log-file |
TEXTWrite all output to a log file, in addition to standard output to the terminal. |
Calculates the Kantorovich-Rubinstein distance between a collection of jplace samples. The command is a re-implementation of guppy kr, see there for more details.
The command reads in the jplace samples and calculates their pairwise KR distances. The result is printed to a symmetrical matrix by default, but can also be printed as a list or an upper triangular matrix.
When using this method, please do not forget to cite
Lucas Czech, Pierre Barbera, Alexandros Stamatakis. Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data. Bioinformatics, 2020. doi:10.1093/bioinformatics/btaa070
Steven Evans, Frederick Matsen. The phylogenetic Kantorovich-Rubinstein metric for environmental sequence samples. Journal of the Royal Statistical Society, 2012. doi:10.1111/j.1467-9868.2011.01018.x