PECA N - PECAplus/Perseus-PluginPECA GitHub Wiki
- Description
- Parameters
- Output
PECA-N (Network) module is modified from PECA to incorporate user-provided biological network data into the inference of rate parameter changes in co-regulated genes, where it is assumed that functionally related genes tend to be co-regulated along the time course.
Specifies the directory where input files, output matrices and plots produced by PECA will be saved. It can be specified manually by typing in the path or the folder can be provided by using the "Select" button.
Friendly Reminder: DO NOT SELECT DESKTOP as the tool produces MANY files
Specification of the number of replicate experiments in the datasets for expression series 1 and 2 (default: 1)
If checked, Gaussian Process (GP) smoothing will be applied to the datasets (default: unchecked).
Determines the variation of values from the mean (default 2.0). A small value will result in the function values changing quickly.
Scaling factor that determines the smoothness of the curve (default 1.0). A small value will result in a function that stays close to the mean value.
Specifies info about the user-provided biological network data. Similar to PECA with GSEA option checked, time-dependent functional enrichment analysis will be performed on the output matrix, in this case on the regulation rate ratio, and a similar additional resulting matrix will be produced.
Specifies the file path of the edge file as part of biological network data that should be used for the inference of rate parameters.
File Format:
- Each line consists of 2 gene names, matching gene identifiers from Gene Name Column and representing an undirected edge with these 2 genes as vertices.
Specifies the file path of the function annotation file that should be used for the time-dependent functional enrichment analysis.
File Format:
- First column named as ‘Pathwayid’, specifying the pathway IDs, e.g. from Gene Ontology and Consensus Pathway DataBase
- Second column named as the same name as gene name columns provided in the parameters, specifying the genes involved
- Third column named as 'pathway', specifying the pathways involved
Sample file with annotations for human genes
Defines the FDR cutoff for which enrichment analysis should use when analyzing biological functions at specific time points (default 0.05, i.e. 5%). The value of this parameter should lie between 0 and 1 (e.g., 0.05, 0.1, 0.2).
Specifies the minimum percentage of genes needed in the experimental data for a pathway to be analyzed (default 0). For instance, if 20% is specified, then at least 20 genes need to be present in the experimental data for a pathway of 100 genes. The value of this parameter should lie between 0 and 100.
Specifies the minimum number of significant genes (within the FDR cutoff) from the experimental data for a particular pathway to be reported (default 1). Anything below this number will be assigned a p-value of 1. The value of this parameter should be a positive integer.
The selected text column will be used as the gene ID identifiers in PECA analysis (default: first text column).
The selected expression/numerical columns that should be used as expression series 1 (typically mRNA concentration data), which comes before expression series 2 (typically protein concentration data) where expression series 1 represents degradation and 2 represents synthesis (default: first half of expression/numeric columns).
RNA ‘expression’ data can be normalized read counts, e.g. FPKM, RPKM, or TPM, from an RNA seq experiment, or signal intensities from a single channel microarray experiment, or actual concentration (if known).
The columns should be ordered by timepoints and then by replicate
Order:
- time point 1 replicate 1
- …
- time point N replicate 1
- time point 1 replicate 2
- …
- time point N replicate 2
Specification for the data input form of Expression Series 1, i.e. what data transformation has been applied already (default: Raw).
-
Raw: unprocessed, untransformed data.
-
ln: loge transformed data.
-
log_2: log2 transformed data.
-
log_10: log10 transformed data.
-
log_custom: logX transformed data, where X is a specified positive real value
The selected expression/numerical columns that should be used as expression series 2 (typically protein concentration data) which comes after expression series 1 (default: second half of expression/numerical columns).
Protein ‘expression’ values can be protein intensities from mass spectrometry experiments (e.g. Label Free Quantification, LFQ) or spectral counts or actual concentrations (if known).
Same order as Expression Series 1. The number of columns should also match expression series 1.
Specification for the data input form of Expression Series 2 (default: Raw).
If checked, mRNA level inference will be performed, i.e. Expression Series 1 is assumed to be DNA concentrations with 1 as values and Expression Series 2 is set as the given mRNA Expression Series (default: unchecked). If not, Expression Series 1 and 2 are both set by the user as explained above.
If mRNA Level Inference is checked, this is the selected expression/numerical columns that should be used as mRNA expression data.
PECA model parameters are estimated using a sampling-based algorithm called MCMC (Markov chain Monte Carlo), which requires the parameters below. All values should be positive integers.
Defines the iterations to be thrown away at the beginning of MCMC run, i.e. the burn-in period (default: 1000).
Defines the interval in which iterations of MCMC are recorded (default: 10).
Defines the total of number of post-burn-in samples to be recorded from MCMC (default: 1000).
Produces two matrices as output:
-
Enrichment Analysis Matrix (contains an extra GO_EdgeCount column compared to PECA Core Enrichment Analysis Matrix):
For matrix 2:
The text column contains: the gene name column provided from Gene Name Column, GO_name, GO_id
-
GO_name is the name of the Gene Ontology
-
GO_id is the ID of the Gene Ontology
The numeric columns contain: MaxSig(Up), MaxSig(Down), Max(Both), GO_size, GO_size_background, GO_EdgeCount, Up(X), Down(X), Sig(X), where X indexes time points corresponding to signedCPSX
-
MaxSig(Up) is the maximum value of -log10(Up(X)) for all X
-
MaxSig(Down) is the maximum value of -log10(Down(X)) for all X
-
Max(Both) is the maximum value of -log10(Sig(X)) for all X
-
GO_size is the number of genes in the pathway
-
GO_size_background is the number of genes in the pathway that appears in the experimental data
-
GO_EdgeCount is the sum of the number of outward edges from all the pathways considered for the given gene
-
Up(X) is the p-value calculated from the number of up-regulated genes
-
Down(X) is the p-value calculated from the number of down-regulated genes
-
Sig(X) is the p-value calculated from the number of up and down-regulated genes
The output format is almost the same as PECA Core, but the values are different since PECA-N incorporates biological network data when inferring rate parameter changes.