Differential analysis - stjude/proteinpaint GitHub Wiki
The differential analysis app performs analyses tailored to each data type.
How this Plot Works
This plot dynamically selects and displays appropriate visualizations based on the data type being analyzed. Each analysis appears as a dedicated subplot, accessible via tabs along the header.
The plot is not available when:
- Its not supported by the dataset.
- The total number of samples for each group exceeds 4,000. If running gene expression analysis, the cutoff is 3,000 total samples.
- There are overlapping samples in the groups.
How to Launch
Create at least one group in the Groups tab of the mass UI. Then create a custom variable from the Create variable using... button. Click on the new button for the custom term will appear below. Click on the option Differential analysis from the pop-up menu. Note the case and control groups are indicated below the menu option.
Supported Analyses
Gene Expression
When performing differential gene expression analysis, both the volcano plot and gene set enrichment analysis plots are available. For more information, refer to the dedicated wiki pages for each plot.
DNA Methylation
Differential methylation analysis compares promoter-level M-values between case and control groups using limma. M-values are the logit transform of beta values (M = log2(beta / (1 - beta))) and are approximately normally distributed, making limma's linear modeling framework the statistically appropriate choice.
The analysis operates at the promoter level: each data point in the volcano plot represents an ENCODE cCRE promoter region, not an individual CpG probe. Promoters are annotated with gene name(s) where available.
Key details:
- Results are displayed in the volcano plot. Positive fold-change (log2) indicates hypermethylation in the case group.
- Promoters are filtered if either group has fewer than 3 non-NA samples (configurable via the
Min samples per groupsetting) or if the promoter has zero variance. - Promoters with any remaining NA values after filtering are dropped (NAs arise from different array types: 450K, EPIC, EPIC v2).
- Empirical Bayes moderation (eBayes) is applied to borrow strength across promoters for more stable variance estimates.
- P-values are adjusted using Benjamini-Hochberg FDR correction.
- Confounding variables (up to 2, continuous or discrete) are supported in the design matrix.
- Clicking a data point does not launch a box plot (unlike gene expression).
The volcano plot p-value table shows separate Promoter and Gene(s) columns for DNA methylation, reflecting the promoter-level nature of the analysis.