Command line options - percolator/percolator GitHub Wiki

Here is the full list of all flags used by percolator. If you want to get started fast, check out this page instead.

We try to keep this list as up-to-date as possible, but run percolator -h for the most accurate list.

General options

-h; --help

Display the help message

-v <level>; --verbose <level>

Set verbosity of output: 0=no processing info, 5=all. Default = 2

-U; --only-psms

Do not remove redundant peptides, keep all PSMs and exclude peptide level probabilities.

-y; --post-processing-mix-max

Use the mix-max method to assign q-values and PEPs. Note that this option only has an effect if the input PSMs are from separate target and decoy searches. This is the default setting.

-Y; --post-processing-tdc

Replace the mix-max method by target-decoy competition for assigning q-values and PEPs. If the input PSMs are from separate target and decoy searches, Percolator's SVM scores will be used to eliminate the lower scoring target or decoy PSM(s) of each scan+expMass combination. If the input PSMs are detected to be coming from a concatenated search, this option will be turned on automatically, as this is incompatible with the mix-max method. In case this detection fails, turn this option on explicitly.

-I <value>; --search-input <value>

Specify the type of target-decoy search: "auto" (Percolator attempts to detect the search type automatically), "concatenated" (single search on concatenated target-decoy protein db) or "separate" (two searches, one against target and one against decoy protein db). Default = "auto".

File input options

N.B.: For the tab-delimited input file (pin-tab) no flag needs to be specified, i.e. percolator pin.tab.

-; --stdinput

Read percolator tab-input format (pin-tab) from standard input.

-e; --stdinput-xml

Read percolator xml-input format (pin-xml) from standard input.

-k <filename>; --xml-in <filename>

Input file given in deprecated pin-xml format generated by e.g. sqt2pin with the -k option.

-s; --no-schema-validation

Skip validation of pin-xml input file against xml schema.

File output options

-r <filename>; --results-peptides <filename>

Output tab delimited results of peptides to a file instead of stdout (will be ignored if used with -U option)

-B <filename>; --decoy-results-peptides <filename>

Output tab delimited results for decoy peptides into a file (will be ignored if used with -U option).

-m <filename>; --results-psms <filename>

Output tab delimited results of PSMs to a file instead of stdout.

-M <filename>; --decoy-results-psms <filename>

Output tab delimited results for decoy PSMs into a file

-l <filename>; --results-proteins <filename>

Output tab delimited results of proteins to a file instead of stdout (Only valid if option -A or -f is active)

-L <filename>; --decoy-results-proteins <filename>

Output tab delimited results for decoy proteins into a file (Only valid if option -A or -f is active)

-J <filename>; --tab-out <filename>

Output computed features to given file in pin-tab format. Can be used to convert pin-xml to pin-tab.

-X <filename>; --xmloutput <filename>

Path to xml-output (pout) file.

-Z; --decoy-xml-output

Include decoys (PSMs, peptides and/or proteins) in the xml-output. Only available if -X is set.

SVM training options

-N <value>; --subset-max-train <value>

Only train an SVM on a subset of PSMs, and use the resulting score vector to evaluate the other PSMs. Recommended when analyzing huge numbers (>1 million) of PSMs. When set to 0, all PSMs are used for training as normal. Default = 0.

-p <value>; --Cpos <value>

Cpos, penalty for mistakes made on positive examples. Set by cross validation if not specified.

-n <value>; --Cneg <value>

Cneg, penalty for mistakes made on negative examples. Set by cross validation if not specified or if -p is not specified.

-t <value>; --testFDR <value>

False discovery rate threshold for evaluating best cross validation result and reported end result. Default = 0.01.

-F <value>; --trainFDR <value>

False discovery rate threshold to define positive examples in training. Set to testFDR if 0. Default = 0.01.

-i <number>; --maxiter <number>

Maximal number of iterations

-x; --quick-validation

Quicker execution by reduced internal cross-validation.

-R; --test-each-iteration

Report performance on test set each iteration.

-S <value>; --seed <value>

Set seed of the random number generator. Default = 1

SVM feature input options

-w <filename>; --weights <filename>

Output final SVM weights to given file.

-W <filename>; --init-weights <filename>

Read initial SVM weights from given file (one per line)

-V <[-]?featureName>; --default-direction <[-]?featureName>

Use given feature name as initial search direction, can be negated to indicate that a lower value is better.

-u; --unitnorm

Use unit normalization [0-1] on features instead of standard deviation normalization.

-O; --override

Override error check and do not fall back on default score vector in case of suspect score vector from SVM.

-D; --doc

Include description of correct features, i.e. features describing the difference between the observed and predicted isoelectric point, retention time and precursor mass. See this page for a more detailed description

-K; --klammer

Retention time features are calculated as in Klammer et al. instead of with Elude. Only available if -D is set.

Protein inference options

-f <fasta_file>; --picked-protein <fasta_file>

Use the picked protein-level FDR to infer protein probabilities. Provide the fasta file as the argument to this flag, which will be used for protein grouping based on an in-silico digest. If no fasta file is available or protein grouping is not desired, set this flag to "auto" to skip protein grouping.

-P <value>; --protein-decoy-pattern <value>

Define the text pattern to identify decoy proteins in the database. Default = "random_".

-z <enzyme>; --protein-enzyme <enzyme>

Type of enzyme "no_enzyme","elastase","pepsin","proteinasek","thermolysin","trypsinp","chymotrypsin","lys-n","lys-c","arg-c","asp-n","glu-c","trypsin" default="trypsin"

-c; --protein-report-fragments

By default, if the peptides associated with protein A are a proper subset of the peptides associated with protein B, then protein A is eliminated and all the peptides are considered as evidence for protein B. Note that this filtering is done based on the complete set of peptides in the database, not based on the identified peptides in the search results. Alternatively, if this option is set and if all of the identified peptides associated with protein B are also associated with protein A, then Percolator will report a comma-separated list of protein IDs, where the full-length protein B is first in the list and the fragment protein A is listed second. Not available for Fido.

-g; --protein-report-duplicates

If multiple database proteins contain exactly the same set of peptides, then Percolator will randomly discard all but one of the proteins. If this option is set, then the IDs of these duplicated proteins will be reported as a comma-separated list. Not available for Fido.

Fido (protein inference) options

-A; --fido-protein

Use the Fido algorithm to infer protein probabilities.

-a <value>; --fido-alpha <value>

Set Fido's probability with which a present protein emits an associated peptide. Set by grid search if not specified.

-b <value>; --fido-beta <value>

Set Fido's probability of creation of a peptide from noise. Set by grid search if not specified.

-G <value>; --fido-gamma <value>

Set Fido's prior probability that a protein is present in the sample. Set by grid search if not specified.

-q; --fido-empirical-protein-q

Estimate empirical p-values and q-values using target-decoy analysis.

-H <value>; --fido-gridsearch-mse-threshold <value>

Q-value threshold that will be used in the computation of the MSE and ROC AUC score in the grid search. Recommended 0.05 for normal size datasets and 0.1 for big size datasets. Default = 0.1.

Fido speed-up options

-d <value>; --fido-gridsearch-depth <value>

Setting the gridsearch-depth to 0 (fastest), 1 or 2 (slowest) controls how much computational time is required for the estimation of alpha, beta and gamma parameters for Fido. Default = 0.

-T <value>; --fido-fast-gridsearch <value>

Apply the specified threshold to PSM, peptide and protein probabilities to obtain a faster estimate of the alpha, beta and gamma parameters. Default = 0; Recommended when set = 0.2.

-C; --fido-no-split-large-components

Do not approximate the posterior distribution by allowing large graph components to be split into subgraphs. The splitting is done by duplicating peptides with low probabilities. Splitting continues until the number of possible configurations of each subgraph is below 2^18.

-E; --fido-protein-truncation-threshold

To speed up inference, proteins for which none of the associated peptides has a probability exceeding the specified threshold will be assigned probability = 0. Default = 0.01.