7. Interpreting Results - SjulsonLab/generalized_contrastive_PCA GitHub Wiki

7. Interpreting Results

This page explains how to interpret the outputs of generalized contrastive PCA (gcPCA), how to identify meaningful components, and how to visualize and validate results.

This guide focuses primarily on gcPCA v4, which is the recommended default method. Differences for other variants are briefly discussed where relevant.

Overview: What gcPCA Produces

After fitting gcPCA, the model returns:

loadings_ — gcPC feature weights
objective_values_ — gcPCA objective values (primary interpretation metric)
Ra_scores_ — projections of dataset A (Ra)
Rb_scores_ — projections of dataset B (Rb)
Ra_values_ — magnitude of variance in Ra
Rb_values_ — magnitude of variance in Rb
objective_function_ — method used (e.g., v4)
null_objective_values_ — permutation-based null distribution (if enabled)

These outputs form the basis of all result interpretation.

The Most Important Quantity: gcPCA Objective Values

For gcPCA v4, the objective function is:

[ (R_a - R_b) / (R_a + R_b) ]

This produces values between −1 and 1:

+1 → variance only in Ra (Dataset A)
−1 → variance only in Rb (Dataset B)
0 → equal variance in both datasets

Therefore:

Top gcPCs (largest values) → enriched in Ra
Bottom gcPCs (most negative) → enriched in Rb
Middle gcPCs (near zero) → shared structure

This is the primary metric for interpreting gcPCA results.

Importantly:

Variance in a single dataset should not be interpreted alone.
The objective value determines whether a dimension is truly contrastive.

For example:

A component may have high variance in Ra
But even higher variance in Rb
Result → negative objective value

This means the component is Rb-enriched, despite large variance in Ra.

This is a common source of misinterpretation.

Important: Scores Are Unit-Normalized

The returned scores:

Ra_scores_
Rb_scores_

are unit-normalized.

This means:

They do not reflect the magnitude of the dimension.

To recover magnitude:

Multiply by:

Ra_values_
Rb_values_

Example:

Ra_projection = model.Ra_scores_ * model.Ra_values_
Rb_projection = model.Rb_scores_ * model.Rb_values_

Failing to do this can lead to incorrect interpretation of gcPC importance.

Recommended Interpretation Workflow

A typical workflow for interpreting gcPCA results:

Plot objective values
Identify top and bottom gcPCs
Inspect variance in Ra and Rb
Use scree plot to determine importance
Examine feature weights (loadings)
Plot gcPCA projections
Perform clustering or downstream analysis

Each step is described below.

Step 1 — Plot Objective Values

Start by plotting:

plt.plot(model.objective_values_)
plt.xlabel("gcPC")
plt.ylabel("Objective Value")

Interpretation:

Large positive values → Ra-enriched
Large negative values → Rb-enriched
Near zero → shared structure

Typically:

Inspect top 3–10
Inspect bottom 3–10

These contain the strongest contrastive structure.

Step 2 — Plot Variance in Each Dataset

Next, inspect variance magnitude:

plt.plot(model.Ra_values_, label="Ra")
plt.plot(model.Rb_values_, label="Rb")
plt.legend()

This helps determine:

Whether components are meaningful
Whether differences are driven by variance magnitude

However:

Variance plots should not replace objective values.

Always interpret objective value first.

Step 3 — Scree Plot (Recommended)

Use the elbow method:

plt.plot(np.abs(model.objective_values_))
plt.xlabel("gcPC")
plt.ylabel("|Objective Value|")

Look for:

Sharp drop-off
Natural elbow

This identifies:

Most meaningful gcPCs
Noise-dominated gcPCs

Step 4 — Permutation Testing (Recommended)

gcPCA supports null distributions using shuffling:

model = gcPCA(method="v4", Nshuffle=1000)
model.fit(Ra, Rb)

Then compare:

null = model.null_objective_values_
real = model.objective_values_

This allows:

Significance testing
Threshold selection
Noise filtering

Recommended:

At least 1000 shuffles
More for high-dimensional datasets

Step 5 — Examine Feature Loadings

Loadings define which features drive each gcPC.

loadings = model.loadings_

Recommended:

Sort by magnitude:

np.argsort(np.abs(loadings[:, gcpc_index]))[::-1]

Interpret:

Large magnitude features drive the contrast
Sign indicates direction

Visualization options:

Heatmaps
Sorted bar plots
Gene ranking
Neuron weight plots

Step 6 — gcPCA Scatter Plots

Project data:

plt.scatter(model.Ra_scores_[:,0], model.Ra_scores_[:,1])
plt.scatter(model.Rb_scores_[:,0], model.Rb_scores_[:,1])

This reveals:

Dataset separation
Continuous gradients
Sub-structure

Multiply by magnitude if needed:

Ra_proj = model.Ra_scores_ * model.Ra_values_
Rb_proj = model.Rb_scores_ * model.Rb_values_

Sparse gcPCA Interpretation

Sparse gcPCA produces:

Feature-selective gcPCs
Easier interpretation
Reduced dimensionality

However:

Choosing sparsity requires balancing:

Interpretability
Structure preservation

Common strategies:

Compare multiple sparsity levels
Evaluate projection stability
Inspect reconstruction structure
Monitor objective values

There is no single optimal sparsity level.

Common Pitfalls

1. Interpreting Scores as Magnitude

Scores are normalized.

Use:

Ra_values_
Rb_values_

to recover magnitude.

2. Using Variance Alone

High variance in Ra does not imply Ra-enrichment.

Always interpret using:

objective values

3. Ignoring Bottom gcPCs

Bottom gcPCs:

Often contain meaningful structure
Represent Rb-enriched dimensions

Always inspect:

Top AND bottom gcPCs

4. Using Too Many gcPCs

Later gcPCs often:

Capture noise
Reduce interpretability

Use:

Scree plot
Permutation testing

5. Over-interpreting Small Objective Values

Values near zero:

Shared structure
Often not contrastive

These typically should not be prioritized.

Interpretation Examples

Neural Data

Top gcPCs may reflect:

Replay structure
Task-specific firing patterns
State-dependent covariance

Bottom gcPCs may reflect:

Baseline activity
Rest-specific structure

Gene Expression

Top gcPCs may reflect:

Differential gene modules
Cell-state-specific covariance
Disease-specific programs

Bottom gcPCs may reflect:

Control-specific structure
Baseline expression modules

Interpretation for Other gcPCA Variants

gcPCA v2

Objective:

[ R_a / R_b ]

Range:

[ [0, Inf) ]

 >1 → Ra enriched
 <1 → Rb enriched

gcPCA v3

Objective:

[ (R_a - R_b)/R_b ]

Range:

[ [-1, Inf) ]

 >0 → Ra enriched
 <0 → Rb enriched

Full interpretation differences are described in the manuscript.

Summary

Recommended interpretation workflow:

Plot objective values
Identify top and bottom gcPCs
Inspect variance plots
Use scree plot
Run permutation testing
Examine loadings
Visualize projections
Perform downstream analysis

gcPCA objective values should always be the primary interpretation metric.

Links to Other Pages

1. Quickstart Guide
2. Installation
3. Conceptual Overview
4. Mathematical Formulation
5. Code Reference
6. Input Data Guidelines