Home - lczech/gappa GitHub Wiki

gappa is a collection of commands for working with phylogenetic data. A typical use case is the evolutionary placement of short environmental (metagenomic) sequences on a reference phylogenetic tree.

Many commands in gappa are implementations of our novel methods. At the same time, it offers some commands that are also implemented in the excellent guppy tool. However, being written in C++, gappa is much faster and needs less memory for most of the tasks.

Phylogenetic Placement

We recommend our review article as an introduction to the topic of phylogenetic placement:

Metagenomic Analysis Using Phylogenetic Placement — A Review of the First Decade.
Lucas Czech, Alexandros Stamatakis, Micah Dunthorn, and Pierre Barbera.
Frontiers in Bioinformatics, 2022. https://doi.org/10.3389/fbinf.2022.871393

For a full stack example of conducting phylogenetic placement with EPA-ng, see here.

Command Line Interface

gappa is used via its command line interface, with subcommands for each task. The commands have the general structure:

gappa <module> <subcommand> <options>

The modules are simply a way of organizing the commands.

Modules

Module analyze: Analyze and compare different jplace files, that is, find differences and patterns between different samples.
Module edit: Edit, manipulate, and transform files in different formats.
Module examine: Examine, visualize, and tabulate information in files.
Module prepare: Prepare and generate data and files needed to run typical pipelines and analyses.
Module simulate: Simple random generation of files for testing.

Module `analyze`

Commands for analyzing and comparing placement data, that is, finding differences and patterns.

Subcommand	Description
correlation	Calculate the Edge Correlation of samples and metadata features.
dispersion	Calculate the Edge Dispersion between samples.
edgepca	Perform Edge PCA (Principal Component Analysis) for a set of samples.
imbalance-kmeans	Run Imbalance k-means clustering on a set of samples.
krd	Calculate the pairwise Kantorovich-Rubinstein (KR) distance matrix between samples.
phylogenetic-kmeans	Run Phylogenetic k-means clustering on a set of samples.
placement-factorization	Perform Placement-Factorization on a set of samples.
squash	Perform Squash Clustering for a set of samples.

Module `edit`

Commands for editing and manipulating files like jplace, fasta or newick.

Subcommand	Description
accumulate	Accumulate the masses of each query in jplace files into basal branches so that they exceed a given mass threshold.
extract	Extract placements from clades of the tree and write per-clade jplace files.
filter	Filter jplace files according to some criteria, that is, remove all queries and/or placement locations that do not pass the provided filter(s).
merge	Merge jplace files by combining their pqueries into one file.
multiplicity	Edit the multiplicities of queries in jplace files.
split	Split the queries in jplace files into multiple files, for example, according to an OTU table.

Module `examine`

Commands for examining, visualizing, and tabulating information in placement data.

Subcommand	Description
assign	Taxonomically assign placed query sequences and output tabulated summarization.
edpl	Calcualte the Expected Distance between Placement Locations (EDPL) for all pqueries.
graft	Make a tree with each of the query sequences represented as a pendant edge.
heat-tree	Make a tree with edges colored according to the placement mass of the samples.
info	Print basic information about placement files.
lwr-distribution	Print a summary table that represents the distribution of the likelihood weight ratios (LWRs) of all pqueries.
lwr-histogram	Print a table with histograms of the likelihood weight ratios (LWRs) of all pqueries.
lwr-list	Print a list of all pqueries with their likelihood weight ratios (LWRs).

Module `prepare`

Commands for preparing and preprocessing of phylogenetic and placement data.

Subcommand	Description
chunkify	Chunkify a set of fasta files and create abundance maps.
clean-tree	Clean a tree in Newick format by removing parts that other parsers have difficulties with.
phat	Generate consensus sequences from a sequence database according to the PhAT method.
taxonomy-tree	Turn a taxonomy into a tree that can be used as a constraint for tree inference.
unchunkify	Unchunkify a set of jplace files using abundance map files and create per-sample jplace files.

Module `simulate`

Commands for random generation of phylogenetic and placement data.

Subcommand	Description
random-alignment	Create a random alignment with a given numer of sequences of a given length.
random-placements	Create a set of random phylogenetic placements on a given reference tree.
random-tree	Create a random tree with a given numer of leaf nodes.

Module `tools`

Auxiliary commands of gappa.

Subcommand	Description
citation	Print references to be cited when using gappa.
license	Show the license of gappa.
version	Extended version information about gappa.

Home - lczech/gappa GitHub Wiki

Phylogenetic Placement

Command Line Interface

Modules

Module analyze

Module edit

Module examine

Module prepare

Module simulate

Module tools