Overview - Ecogenomics/CheckM GitHub Wiki

CheckM is executed from the command line and consists of a series of commands in order to support a number of different analyses and workflows. These commands are organized into several related groups. The two most common workflows are to assess genomes using either lineage-specific or taxonomic-specific marker sets.

Lineage-specific marker set

  • tree: place bins in the reference genome tree
  • tree_qa: assess phylogenetic markers found in each bin
  • lineage_set: infer lineage-specific marker sets for each bin

Taxonomic-specific marker set

  • taxon_list: list available taxonomic-specific marker sets
  • taxon_set: infer taxonomic-specific marker set

Apply marker set to genome bins

  • analyze: identify marker genes in bins
  • qa: assess bins for contamination and completeness

Common workflows (combines above commands)

  • lineage_wf: runs tree, lineage_set, analyze, qa
  • taxonomy_wf: runs taxon_set, analyze, qa

Bin QA plots

  • bin_qa_plot: bar plot of bin completeness, contamination, and strain heterogeneity

Reference distribution plots

  • gc_plot: create GC histogram and delta-GC plot
  • coding_plot: create coding density (CD) histogram and delta-CD plot
  • tetra_plot: create tetranucleotide distance (TD) histogram and delta-TD plot
  • dist_plot: create image with GC, CD, and TD distribution plots together

General plots

  • nx_plot: create Nx-plots
  • len_plot: cumulative sequence length plot
  • len_hist: sequence length histogram
  • marker_plot: plot position of marker genes on sequences
  • par_plot: parallel coordinate plot of GC and coverage
  • gc_bias_plot: plot bin coverage as a function of GC

Sequence subspace plots

  • cov_pca: PCA plot of coverage profiles
  • tetra_pca: PCA plot of tetranucleotide signatures

Bin exploration and modification:

  • unique: ensure no sequences are assigned to multiple bins
  • merge: identify bins with complementary sets of marker genes
  • outliers: [Experimental] identify outliers in bins relative to reference distributions
  • modify: [Experimental] modify sequences in a bin

Utility functions

  • unbinned: identify unbinned sequences
  • coverage: calculate coverage of sequences
  • tetra: calculate tetranucleotide signature of sequences
  • profile: calculate percentage of reads mapped to each bin
  • join_tables: join tab-separated value tables containing bin information
  • ssu_finder: identify SSU (16S/18S) rRNAs in sequences
  • bin_compare: compare two sets of bins (e.g., from alternative binning methods)

For more information on any of these commands type: > checkm COMMAND –h