Command line options - mooreryan/ZetaHunter GitHub Wiki

Here are all the options as of version 1.0.7.

  Options:
  -i, --inaln=<s+>                             Input alignment(s)
  -o, --outdir=<s>                             Directory for output
  -t, --threads=<i>                            Number of processors to use (default: 2)
  -d, --db-otu-info=<s>                        Database OTU info file name (default: /Users/moorer/projects/ZetaHunter/assets/db_otu_info.txt)
  -m, --mask=<s>                               Fasta file with the mask (default: /Users/moorer/projects/ZetaHunter/assets/mask.fa.gz)
  -b, --db-seqs=<s>                            Fasta file with aligned DB seqs (default: /Users/moorer/projects/ZetaHunter/assets/db_seqs.fa.gz)
  -r, --mothur=<s>                             The mothur executable (default: /Users/moorer/projects/ZetaHunter/bin/mac/mothur)
  -s, --sortmerna=<s>                          The SortMeRNA executable (default: /Users/moorer/projects/ZetaHunter/bin/mac/sortmerna)
  -n, --indexdb-rna=<s>                        The SortMeRNA idnexdb_rna executable (default: /Users/moorer/projects/ZetaHunter/bin/mac/indexdb_rna)
  -c, --cluster-method=<s>                     Either furthest, average, or nearest (default: average)
  -u, --otu-percent=<i>                        OTU similarity percentage (default: 97)
  -k, --check-chimeras, --no-check-chimeras    Flag to check chimeras (default: true)
  -a, --base=<s>                               Base name for output files (default: ZH_2018_07_15_13_29)
  -e, --debug                                  Debug mode, don't delete tmp files or clean up the working dir (the out dir will be empty)
  -v, --version                                Print version and exit
  -h, --help                                   Show this message

Most important options

To keep things simple, for most ZetaHunter runs, you will only need to worry about these three options:

--inaln
--outdir
--threads

Detailed info

--inaln

These are fasta files output from SINA aligner. See this wiki page for more information.

--outdir

All ZetaHunter output will be in the folder specified here.

--threads

Some parts of ZetaHunter can take advantage of multiple CPUs. For those steps, specify the number of processors to use with this option.

--db-otu-info

This file will have info about the sequence in ZH's gold database. If you wanted to use ZH for something other than Zetaproteobacteria (e.g., OP3...see here), you would need to make this file yourself and specify it with this argument.

--mask

The mask used by ZH. It is a fasta file with an alignment, but instead of nucleotides, it has a * character indicating columns of the alignment to be included in the mask. We provide one of these for Zetaproteobacteria classification.

--db-seqs

These are the actual sequences used to classify user queries. By default it is the Zetaproteobacteria Gold Database.

--mothur, --sortmerna, --indexdb-rna

These arguments are used to specify the path to each of their respective executable binaries. ZetaHunter comes bundled with the binaries it uses so you don't have to worry about these. But if for some reason you want to use a different version, you can specify the path to the binaries here. Note that we don't recommend this as testing is done with the versions included in ZetaHunter.

--cluster-method

The type of clustering you want to do furthest neighbor (complete linkage), average neighbor, or nearest nearest neighbor (single linkage).

From the mothur website:

Nearest neighbor: Each of the sequences within an OTU are at most X% distant from the most similar sequence in the OTU. Furthest neighbor: All of the sequences within an OTU are at most X% distant from all of the other sequences within the OTU. Average neighbor: This method is a middle ground between the other two algorithms.

--otu-percent

By default, OTU cutoff percentage is 97%, but you can change it to whatever you want with this option. If you change it to 90% then the OTU definition would group together organisms that are more distantly related. Whereas if you selected 99% for this, then there would be more splitting of groups than the default.

--check-chimeras, --no-check-chimeras

These flags control chimera checking. We use uchime as implemented in mothur for this step. This step is a bit slow, so you can disable it to save some time if you're feeling good about your sequences.

--base

The base name used in the output files. By default it uses a base name with a timestamp like: ZH_2018_07_15_13_38 for ZetaHunter 2018-07-15 ran at 1:38 pm.

--debug

An option you can try when things go wrong. This was mainly used in the early stages of developing ZetaHunter.

--version

Print out version and contact info. If you find a bug or use ZetaHunter in you research, it is good to note the version of the program that you're using!