Description
Parameters
Output
- General Output
- GSA Output (if GSA had been checked)

Description

PECA Core implements the core functionality for analyzing a two-level time series data set (e.g. paired protein and mRNA concentration data). It identifies significantly regulated genes at each time point with probability scores of significant change points.

Note that PECA Core does not deconvolute the contributions of changes in synthesis or degradation.

Parameters

Working Directory

Specifies the directory where input files, output matrices and plots produced by PECA will be saved. It can be specified manually by typing in the path or the folder can be provided by using the "Select" button.

Friendly Reminder: DO NOT SELECT DESKTOP as the tool produces MANY files

About Data

Number of Replicates

Specification of the number of replicate experiments in the datasets for expression series 1 and 2 (default: 1)

Smoothing

If checked, Gaussian Process (GP) smoothing will be applied to the datasets (default: unchecked).

Gaussian Process Variance Parameter

Determines the variation of values from the mean (default 2.0). A small value will result in the function values changing quickly.

Gaussian Process Scale Parameter

Scaling factor that determines the smoothness of the curve (default 1.0). A small value will result in a function that stays close to the mean value.

Gene Set Analysis (GSA)

If checked, a time-dependent functional enrichment analysis will be performed on the output matrix of PECA, specifically on the change point score based on the regulation rate ratio for PECA-core and N. The result will be displayed as an additional output matrix (default unchecked). The resulting matrix reports the biological functions whose members are up or down-regulated at specific time points.

Biological Function Annotation Files

Specifies the file path of the function annotation file that should be used for the time-dependent functional enrichment analysis.

File Format:

First column named as ‘Pathwayid’, specifying the pathway IDs, e.g. from Gene Ontology and Consensus Pathway DataBase
Second column named as the same name as gene name columns provided in the parameters, specifying the genes involved
Third column named as 'pathway', specifying the pathways involved

Sample file with annotations for human genes

Enrichment Analysis FDR Cutoff

Defines the FDR cutoff for which enrichment analysis should use when analyzing biological functions at specific time points (default 0.05, i.e. 5%). The value of this parameter should lie between 0 and 1 (e.g., 0.05, 0.1, 0.2).

Minimum % of Genes to Consider a Pathway

Specifies the minimum percentage of genes needed in the experimental data for a pathway to be analyzed (default 0). For instance, if 20% is specified, then at least 20 genes need to be present in the experimental data for a pathway of 100 genes. The value of this parameter should lie between 0 and 100.

Minimum Number of Genes For Hypothesis Testing

Specifies the minimum number of significant genes (within the FDR cutoff) from the experimental data for a particular pathway to be reported (default 1). Anything below this number will be assigned a p-value of 1. The value of this parameter should be a positive integer.

Select Data

Gene Name Column

The selected text column will be used as the gene ID identifiers in PECA analysis (default: first text column).

Expression Series 1

The selected expression/numerical columns that should be used as expression series 1 (typically mRNA concentration data), which comes before expression series 2 (typically protein concentration data) where expression series 1 represents degradation and 2 represents synthesis (default: first half of expression/numeric columns).

RNA ‘expression’ data can be normalized read counts, e.g. FPKM, RPKM, or TPM, from an RNA seq experiment, or signal intensities from a single channel microarray experiment, or actual concentration (if known).

The columns should be ordered by timepoints and then by replicate

Order:

time point 1 replicate 1
…
time point N replicate 1
time point 1 replicate 2
…
time point N replicate 2

Data Input Form 1

Specification for the data input form of Expression Series 1, i.e. what data transformation has been applied already (default: Raw).

Raw: unprocessed, untransformed data.
ln: log_e transformed data.
log_2: log₂ transformed data.
log_10: log₁₀ transformed data.
log_custom: log_X transformed data, where X is a specified positive real value

Expression Series 2

The selected expression/numerical columns that should be used as expression series 2 (typically protein concentration data) which comes after expression series 1 (default: second half of expression/numerical columns).

Protein ‘expression’ values can be protein intensities from mass spectrometry experiments (e.g. Label Free Quantification, LFQ) or spectral counts or actual concentrations (if known).

Same order as Expression Series 1. The number of columns should also match expression series 1.

Data Input Form 2

Specification for the data input form of Expression Series 2 (default: Raw).

Same as Data Input Form 1

mRNA Level Inference

If checked, mRNA level inference will be performed, i.e. Expression Series 1 is assumed to be DNA concentrations with 1 as values and Expression Series 2 is set as the given mRNA Expression Series (default: unchecked). If not, Expression Series 1 and 2 are both set by the user as explained above.

mRNA Expression Series

If mRNA Level Inference is checked, this is the selected expression/numerical columns that should be used as mRNA expression data.

Data Input Form

Specification for the data input form of mRNA Expression Series (default: Raw).

Same as Data Input Form 1

MCMC Parameters

PECA model parameters are estimated using a sampling-based algorithm called MCMC (Markov chain Monte Carlo), which requires the parameters below. All values should be positive integers.

MCMC Burn-In

Defines the iterations to be thrown away at the beginning of MCMC run, i.e. the burn-in period (default: 1000).

MCMC Thinning

Defines the interval in which iterations of MCMC are recorded (default: 10).

MCMC Samples

Defines the total of number of post-burn-in samples to be recorded from MCMC (default: 1000).

Output

General Output

The text column is the gene name column provided by Gene Name Column.

The main/expression columns are the log_e transformed Expression Series 1 and Expression Series 2 data sets.

The other numeric columns contain : RY, signedCPSX, FDRX, where X indexes time point (i.e. X=1 refers to the second time point) and Y indexes time point interval starting from 0 (i.e. Y=0 refers to the interval between the first and second time points).

RY is the rate ratio for the time interval preceding the specified time point (e.g. if Y = 1, then the interval is between time point indices 1 and 2)
signedCPSX is the change point score with signs indicating up/down regulation.
Positive sign describes upregulation; negative sign down regulation.
FDRX is the False Discovery Rate

NOTE: the rate ratio itself does NOT inform on the significance or direction of the change. The DIFFERENCE between consecutive rate ratios (adjacent time intervals) describes the direction of change, and the FDR the significance.

GSA Output (if GSA had been checked)

The text column contains: the gene name column provided from Gene Name Column, GO_name, GO_id

GO_name is the name of the Gene Ontology
GO_id is the ID of the Gene Ontology

The numeric columns contain: MaxSig(Up), MaxSig(Down), Max(Both), GO_size, GO_size_background, Up(X), Down(X), Sig(X), where X indexes time points corresponding to signedCPSX

MaxSig(Up) is the maximum value of -log₁₀(Up(X)) for all X
MaxSig(Down) is the maximum value of -log₁₀(Down(X)) for all X
Max(Both) is the maximum value of -log₁₀(Sig(X)) for all X
GO_size is the number of genes in the pathway
GO_size_background is the number of genes in the pathway that appears in the experimental data
Up(X) is the p-value calculated from the number of up-regulated genes
Down(X) is the p-value calculated from the number of down-regulated genes
Sig(X) is the p-value calculated from the number of up and down-regulated genes

PECA Core - PECAplus/Perseus-PluginPECA GitHub Wiki

Contents

Description

Parameters

Working Directory

About Data

Number of Replicates

Smoothing

Gaussian Process Variance Parameter

Gaussian Process Scale Parameter

Gene Set Analysis (GSA)

Biological Function Annotation Files

Enrichment Analysis FDR Cutoff

Minimum % of Genes to Consider a Pathway

Minimum Number of Genes For Hypothesis Testing

Select Data

Gene Name Column

Expression Series 1

Data Input Form 1

Expression Series 2

Data Input Form 2

mRNA Level Inference

mRNA Expression Series

Data Input Form

MCMC Parameters

MCMC Burn-In

MCMC Thinning

MCMC Samples

Output

General Output

GSA Output (if GSA had been checked)

⚠️ GitHub.com Fallback ⚠️

PECA Core - PECAplus/Perseus-PluginPECA GitHub Wiki

Contents

Description

Parameters

Working Directory

About Data

Number of Replicates

Smoothing

Gaussian Process Variance Parameter

Gaussian Process Scale Parameter

Gene Set Analysis (GSA)

Biological Function Annotation Files

Enrichment Analysis FDR Cutoff

Minimum % of Genes to Consider a Pathway

Minimum Number of Genes For Hypothesis Testing

Select Data

Gene Name Column

Expression Series 1

Data Input Form 1

Expression Series 2

Data Input Form 2

mRNA Level Inference

mRNA Expression Series

Data Input Form

MCMC Parameters

MCMC Burn-In

MCMC Thinning

MCMC Samples

Output

General Output

GSA Output (if GSA had been checked)

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️