7. Interpreting Results - SjulsonLab/generalized_contrastive_PCA GitHub Wiki
7. Interpreting Results
This page explains how to interpret the outputs of generalized contrastive PCA (gcPCA), how to identify meaningful components, and how to visualize and validate results.
This guide focuses primarily on gcPCA v4, which is the recommended default method. Differences for other variants are briefly discussed where relevant.
Overview: What gcPCA Produces
After fitting gcPCA, the model returns:
loadings_— gcPC feature weightsobjective_values_— gcPCA objective values (primary interpretation metric)Ra_scores_— projections of dataset A (Ra)Rb_scores_— projections of dataset B (Rb)Ra_values_— magnitude of variance in RaRb_values_— magnitude of variance in Rbobjective_function_— method used (e.g., v4)null_objective_values_— permutation-based null distribution (if enabled)
These outputs form the basis of all result interpretation.
The Most Important Quantity: gcPCA Objective Values
For gcPCA v4, the objective function is:
[ (R_a - R_b) / (R_a + R_b) ]
This produces values between −1 and 1:
- +1 → variance only in Ra (Dataset A)
- −1 → variance only in Rb (Dataset B)
- 0 → equal variance in both datasets
Therefore:
- Top gcPCs (largest values) → enriched in Ra
- Bottom gcPCs (most negative) → enriched in Rb
- Middle gcPCs (near zero) → shared structure
This is the primary metric for interpreting gcPCA results.
Importantly:
Variance in a single dataset should not be interpreted alone.
The objective value determines whether a dimension is truly contrastive.
For example:
- A component may have high variance in Ra
- But even higher variance in Rb
- Result → negative objective value
This means the component is Rb-enriched, despite large variance in Ra.
This is a common source of misinterpretation.
Important: Scores Are Unit-Normalized
The returned scores:
Ra_scores_Rb_scores_
are unit-normalized.
This means:
They do not reflect the magnitude of the dimension.
To recover magnitude:
Multiply by:
Ra_values_Rb_values_
Example:
Ra_projection = model.Ra_scores_ * model.Ra_values_
Rb_projection = model.Rb_scores_ * model.Rb_values_
Failing to do this can lead to incorrect interpretation of gcPC importance.
Recommended Interpretation Workflow
A typical workflow for interpreting gcPCA results:
- Plot objective values
- Identify top and bottom gcPCs
- Inspect variance in Ra and Rb
- Use scree plot to determine importance
- Examine feature weights (loadings)
- Plot gcPCA projections
- Perform clustering or downstream analysis
Each step is described below.
Step 1 — Plot Objective Values
Start by plotting:
plt.plot(model.objective_values_)
plt.xlabel("gcPC")
plt.ylabel("Objective Value")
Interpretation:
- Large positive values → Ra-enriched
- Large negative values → Rb-enriched
- Near zero → shared structure
Typically:
- Inspect top 3–10
- Inspect bottom 3–10
These contain the strongest contrastive structure.
Step 2 — Plot Variance in Each Dataset
Next, inspect variance magnitude:
plt.plot(model.Ra_values_, label="Ra")
plt.plot(model.Rb_values_, label="Rb")
plt.legend()
This helps determine:
- Whether components are meaningful
- Whether differences are driven by variance magnitude
However:
Variance plots should not replace objective values.
Always interpret objective value first.
Step 3 — Scree Plot (Recommended)
Use the elbow method:
plt.plot(np.abs(model.objective_values_))
plt.xlabel("gcPC")
plt.ylabel("|Objective Value|")
Look for:
- Sharp drop-off
- Natural elbow
This identifies:
- Most meaningful gcPCs
- Noise-dominated gcPCs
Step 4 — Permutation Testing (Recommended)
gcPCA supports null distributions using shuffling:
model = gcPCA(method="v4", Nshuffle=1000)
model.fit(Ra, Rb)
Then compare:
null = model.null_objective_values_
real = model.objective_values_
This allows:
- Significance testing
- Threshold selection
- Noise filtering
Recommended:
- At least 1000 shuffles
- More for high-dimensional datasets
Step 5 — Examine Feature Loadings
Loadings define which features drive each gcPC.
loadings = model.loadings_
Recommended:
Sort by magnitude:
np.argsort(np.abs(loadings[:, gcpc_index]))[::-1]
Interpret:
- Large magnitude features drive the contrast
- Sign indicates direction
Visualization options:
- Heatmaps
- Sorted bar plots
- Gene ranking
- Neuron weight plots
Step 6 — gcPCA Scatter Plots
Project data:
plt.scatter(model.Ra_scores_[:,0], model.Ra_scores_[:,1])
plt.scatter(model.Rb_scores_[:,0], model.Rb_scores_[:,1])
This reveals:
- Dataset separation
- Continuous gradients
- Sub-structure
Multiply by magnitude if needed:
Ra_proj = model.Ra_scores_ * model.Ra_values_
Rb_proj = model.Rb_scores_ * model.Rb_values_
Sparse gcPCA Interpretation
Sparse gcPCA produces:
- Feature-selective gcPCs
- Easier interpretation
- Reduced dimensionality
However:
Choosing sparsity requires balancing:
- Interpretability
- Structure preservation
Common strategies:
- Compare multiple sparsity levels
- Evaluate projection stability
- Inspect reconstruction structure
- Monitor objective values
There is no single optimal sparsity level.
Common Pitfalls
1. Interpreting Scores as Magnitude
Scores are normalized.
Use:
Ra_values_Rb_values_
to recover magnitude.
2. Using Variance Alone
High variance in Ra does not imply Ra-enrichment.
Always interpret using:
- objective values
3. Ignoring Bottom gcPCs
Bottom gcPCs:
- Often contain meaningful structure
- Represent Rb-enriched dimensions
Always inspect:
- Top AND bottom gcPCs
4. Using Too Many gcPCs
Later gcPCs often:
- Capture noise
- Reduce interpretability
Use:
- Scree plot
- Permutation testing
5. Over-interpreting Small Objective Values
Values near zero:
- Shared structure
- Often not contrastive
These typically should not be prioritized.
Interpretation Examples
Neural Data
Top gcPCs may reflect:
- Replay structure
- Task-specific firing patterns
- State-dependent covariance
Bottom gcPCs may reflect:
- Baseline activity
- Rest-specific structure
Gene Expression
Top gcPCs may reflect:
- Differential gene modules
- Cell-state-specific covariance
- Disease-specific programs
Bottom gcPCs may reflect:
- Control-specific structure
- Baseline expression modules
Interpretation for Other gcPCA Variants
gcPCA v2
Objective:
[ R_a / R_b ]
Range:
[ [0, Inf) ]
>1 → Ra enriched
<1 → Rb enriched
gcPCA v3
Objective:
[ (R_a - R_b)/R_b ]
Range:
[ [-1, Inf) ]
>0 → Ra enriched
<0 → Rb enriched
Full interpretation differences are described in the manuscript.
Summary
Recommended interpretation workflow:
- Plot objective values
- Identify top and bottom gcPCs
- Inspect variance plots
- Use scree plot
- Run permutation testing
- Examine loadings
- Visualize projections
- Perform downstream analysis
gcPCA objective values should always be the primary interpretation metric.
Links to Other Pages
1. Quickstart Guide
2. Installation
3. Conceptual Overview
4. Mathematical Formulation
5. Code Reference
6. Input Data Guidelines