Running MetaCompass - marbl/MetaCompass GitHub Wiki

Running MetaCompass

MetaCompass is run via the script: 'go_metacompass.py' found in the base installation directory.

Usage info is as follows:

usage: go_metacompass.py [-h] [-c [CONFIG]] [-S [SAMPLES]] [-P [PAIRED]]
                         [-U [UNPAIRED]] [-d [DB]] [-i [ITERATIONS]]
                         [-r [REF]] [-p [PICKREF]] [-m [MINCOV]]
                         [-g [MINCTGLEN]] [-l [READLEN]] [-f FILTER] [-b] -o
                         [OUTDIR] [-e [SAMPLEID]] [-v] [-k] [-t [THREADS]]
                         [-q [QSUB]] [-F] [-u]

snakemake and metacompass params

optional arguments:
  -h, --help            show this help message and exit

required:
  -c [CONFIG], --config [CONFIG]
                        config (json) file, set read length etc
  -S [SAMPLES], --Samples [SAMPLES]
                        Provide file with fq reads (1 file per line)
  -P [PAIRED], --paired [PAIRED]
                        Provide comma separated list of paired reads
                        (r1.1.fq,r1.2.fq)
  -U [UNPAIRED], --unpaired [UNPAIRED]
                        Provide comma separated list of unpaired reads
                        (r1.fq,r2.fq,r3.fq)

metacompass:
  -d [DB], --db [DB]    marker gene database directory
  -i [ITERATIONS], --iterations [ITERATIONS]
                        num iterations
  -r [REF], --ref [REF]
                        reference genomes
  -p [PICKREF], --pickref [PICKREF]
                        depth or breadth
  -m [MINCOV], --mincov [MINCOV]
                        min coverage to assemble
  -g [MINCTGLEN], --minctglen [MINCTGLEN]
                        min contig length
  -l [READLEN], --readlen [READLEN]
                        max read length
  -f FILTER, --filter FILTER
                        filter recruited genomes with mash (experimental)

output:
  -b, --clobber         clobber output directory (if exists?)
  -o [OUTDIR], --outdir [OUTDIR]
                        output directory? (cwd default)
  -e [SAMPLEID], --sampleid [SAMPLEID]
                        sample id (fq prefix is default)
  -v, --verbose         verbose
  -k, --keepoutput      keep all output generated (default is to delete all
                        but final fasta files)

performance:
  -t [THREADS], --threads [THREADS]
                        num threads
  -q [QSUB], --qsub [QSUB]

snakemake:
  -F, --Force           force snakemake to rerun
  -u, --unlock          unlock snakemake locks

Typical MetaCompass Command Line

A typical MetaCompass command for assembling a metagenomic sample "Sample" containing paired-end and singleton reads would be:

python3 go_metacompass.py -P Sample.1.fastq,Sample.2.fastq -U Sample.singleton.fastq -o Sample_output

where:

Sample.1.fastq and Sample.2.fastq contain forward and reverse paired-end reads, respectively
Sample.singleton.fastq contains unpaired reads
Sample_output is output directory

Parameters and Options to Consider when Running MetaCompas

Reference selection parameters

Rerefence genomes fasta file [-r [REF]]

By default, MetaCompass uses a marker gene approach to select references that are present in the metagenome. However, if the set of genomes in the sample is known or you are trying to assemble a particular known genome from the sample we recommend using this option.

Read mapping parameters

Filter recruited genomes with mash (experimental)[-f FILTER, --filter FILTER]

After the Reference selection process, Bowtie2 is used to map reads to the selected reference genomes. By default, MetaCompass aligned the reads to all genomes but if too many genomes are selected (e.g. >300 genomes) Bowtie2 can become a bottleneck in MetaCompass. To speed up the process, we added the option of filtering the selected reference genomes using Mash.

Reference-guided assembly parameres

Choice of coverage to assign reads to genomes [-p [PICKREF]]

A key step in the assembly process is dealing with reads aligning to multiple genomes.

By default, MetaCompass assign each multi-mapped read to the genome with the highest "breadth" of coverage. This parameter choice works best for low-abundance genomes.
There is also the option of using "depth" of coverage to assign multi-mapped reads.
The parameter "all" uses all read mappings to assemble the genomes. This option generates redundancy in the final contigs.

Minimum depth of coverage to assemble each genome [-m [MINCOV]]

An advantage of MetaCompass reference-guided assembly over De novo assembly is the capacity of assembling low-abundance bacterial genomes. After each read is assigned to a particular genome, the contigs are build using a minimum depth of coverage threshold. For best results on low-abundance bacterial genomes, we recommend using MINCOV between 1 and 3.