Home - lczech/gappa GitHub Wiki
gappa is a collection of commands for working with phylogenetic data. A typical use case is the evolutionary placement of short environmental (metagenomic) sequences on a reference phylogenetic tree.
Many commands in gappa are implementations of our novel methods. At the same time, it offers some commands that are also implemented in the excellent guppy tool. However, being written in C++, gappa is much faster and needs less memory for most of the tasks.
We recommend our review article as an introduction to the topic of phylogenetic placement:
Metagenomic Analysis Using Phylogenetic Placement — A Review of the First Decade.
Lucas Czech, Alexandros Stamatakis, Micah Dunthorn, and Pierre Barbera.
Frontiers in Bioinformatics, 2022. https://doi.org/10.3389/fbinf.2022.871393
For a full stack example of conducting phylogenetic placement with EPA-ng, see here.
gappa is used via its command line interface, with subcommands for each task. The commands have the general structure:
gappa <module> <subcommand> <options>
The modules are simply a way of organizing the commands.
- Module
analyze
: Analyze and compare differentjplace
files, that is, find differences and patterns between different samples. - Module
edit
: Edit, manipulate, and transform files in different formats. - Module
examine
: Examine, visualize, and tabulate information in files. - Module
prepare
: Prepare and generate data and files needed to run typical pipelines and analyses. - Module
simulate
: Simple random generation of files for testing.
Commands for analyzing and comparing placement data, that is, finding differences and patterns.
Subcommand | Description |
---|---|
correlation | Calculate the Edge Correlation of samples and metadata features. |
dispersion | Calculate the Edge Dispersion between samples. |
edgepca | Perform Edge PCA (Principal Component Analysis) for a set of samples. |
imbalance-kmeans | Run Imbalance k-means clustering on a set of samples. |
krd | Calculate the pairwise Kantorovich-Rubinstein (KR) distance matrix between samples. |
phylogenetic-kmeans | Run Phylogenetic k-means clustering on a set of samples. |
placement-factorization | Perform Placement-Factorization on a set of samples. |
squash | Perform Squash Clustering for a set of samples. |
Commands for editing and manipulating files like jplace, fasta or newick.
Subcommand | Description |
---|---|
accumulate | Accumulate the masses of each query in jplace files into basal branches so that they exceed a given mass threshold. |
extract | Extract placements from clades of the tree and write per-clade jplace files. |
filter | Filter jplace files according to some criteria, that is, remove all queries and/or placement locations that do not pass the provided filter(s). |
merge | Merge jplace files by combining their pqueries into one file. |
multiplicity | Edit the multiplicities of queries in jplace files. |
split | Split the queries in jplace files into multiple files, for example, according to an OTU table. |
Commands for examining, visualizing, and tabulating information in placement data.
Subcommand | Description |
---|---|
assign | Taxonomically assign placed query sequences and output tabulated summarization. |
edpl | Calcualte the Expected Distance between Placement Locations (EDPL) for all pqueries. |
graft | Make a tree with each of the query sequences represented as a pendant edge. |
heat-tree | Make a tree with edges colored according to the placement mass of the samples. |
info | Print basic information about placement files. |
lwr-distribution | Print a summary table that represents the distribution of the likelihood weight ratios (LWRs) of all pqueries. |
lwr-histogram | Print a table with histograms of the likelihood weight ratios (LWRs) of all pqueries. |
lwr-list | Print a list of all pqueries with their likelihood weight ratios (LWRs). |
Commands for preparing and preprocessing of phylogenetic and placement data.
Subcommand | Description |
---|---|
chunkify | Chunkify a set of fasta files and create abundance maps. |
clean-tree | Clean a tree in Newick format by removing parts that other parsers have difficulties with. |
phat | Generate consensus sequences from a sequence database according to the PhAT method. |
taxonomy-tree | Turn a taxonomy into a tree that can be used as a constraint for tree inference. |
unchunkify | Unchunkify a set of jplace files using abundance map files and create per-sample jplace files. |
Commands for random generation of phylogenetic and placement data.
Subcommand | Description |
---|---|
random-alignment | Create a random alignment with a given numer of sequences of a given length. |
random-placements | Create a set of random phylogenetic placements on a given reference tree. |
random-tree | Create a random tree with a given numer of leaf nodes. |
Auxiliary commands of gappa.
Subcommand | Description |
---|---|
citation | Print references to be cited when using gappa. |
license | Show the license of gappa. |
version | Extended version information about gappa. |