7. Interpreting Results - SjulsonLab/generalized_contrastive_PCA GitHub Wiki

7. Interpreting Results

This page explains how to interpret the outputs of generalized contrastive PCA (gcPCA), how to identify meaningful components, and how to visualize and validate results.

This guide focuses primarily on gcPCA v4, which is the recommended default method. Differences for other variants are briefly discussed where relevant.


Overview: What gcPCA Produces

After fitting gcPCA, the model returns:

  • loadings_ — gcPC feature weights
  • objective_values_ — gcPCA objective values (primary interpretation metric)
  • Ra_scores_ — projections of dataset A (Ra)
  • Rb_scores_ — projections of dataset B (Rb)
  • Ra_values_ — magnitude of variance in Ra
  • Rb_values_ — magnitude of variance in Rb
  • objective_function_ — method used (e.g., v4)
  • null_objective_values_ — permutation-based null distribution (if enabled)

These outputs form the basis of all result interpretation.


The Most Important Quantity: gcPCA Objective Values

For gcPCA v4, the objective function is:

[ (R_a - R_b) / (R_a + R_b) ]

This produces values between −1 and 1:

  • +1 → variance only in Ra (Dataset A)
  • −1 → variance only in Rb (Dataset B)
  • 0 → equal variance in both datasets

Therefore:

  • Top gcPCs (largest values) → enriched in Ra
  • Bottom gcPCs (most negative) → enriched in Rb
  • Middle gcPCs (near zero) → shared structure

This is the primary metric for interpreting gcPCA results.

Importantly:

Variance in a single dataset should not be interpreted alone.
The objective value determines whether a dimension is truly contrastive.

For example:

  • A component may have high variance in Ra
  • But even higher variance in Rb
  • Result → negative objective value

This means the component is Rb-enriched, despite large variance in Ra.

This is a common source of misinterpretation.


Important: Scores Are Unit-Normalized

The returned scores:

  • Ra_scores_
  • Rb_scores_

are unit-normalized.

This means:

They do not reflect the magnitude of the dimension.

To recover magnitude:

Multiply by:

  • Ra_values_
  • Rb_values_

Example:

Ra_projection = model.Ra_scores_ * model.Ra_values_
Rb_projection = model.Rb_scores_ * model.Rb_values_

Failing to do this can lead to incorrect interpretation of gcPC importance.


Recommended Interpretation Workflow

A typical workflow for interpreting gcPCA results:

  1. Plot objective values
  2. Identify top and bottom gcPCs
  3. Inspect variance in Ra and Rb
  4. Use scree plot to determine importance
  5. Examine feature weights (loadings)
  6. Plot gcPCA projections
  7. Perform clustering or downstream analysis

Each step is described below.


Step 1 — Plot Objective Values

Start by plotting:

plt.plot(model.objective_values_)
plt.xlabel("gcPC")
plt.ylabel("Objective Value")

Interpretation:

  • Large positive values → Ra-enriched
  • Large negative values → Rb-enriched
  • Near zero → shared structure

Typically:

  • Inspect top 3–10
  • Inspect bottom 3–10

These contain the strongest contrastive structure.


Step 2 — Plot Variance in Each Dataset

Next, inspect variance magnitude:

plt.plot(model.Ra_values_, label="Ra")
plt.plot(model.Rb_values_, label="Rb")
plt.legend()

This helps determine:

  • Whether components are meaningful
  • Whether differences are driven by variance magnitude

However:

Variance plots should not replace objective values.

Always interpret objective value first.


Step 3 — Scree Plot (Recommended)

Use the elbow method:

plt.plot(np.abs(model.objective_values_))
plt.xlabel("gcPC")
plt.ylabel("|Objective Value|")

Look for:

  • Sharp drop-off
  • Natural elbow

This identifies:

  • Most meaningful gcPCs
  • Noise-dominated gcPCs

Step 4 — Permutation Testing (Recommended)

gcPCA supports null distributions using shuffling:

model = gcPCA(method="v4", Nshuffle=1000)
model.fit(Ra, Rb)

Then compare:

null = model.null_objective_values_
real = model.objective_values_

This allows:

  • Significance testing
  • Threshold selection
  • Noise filtering

Recommended:

  • At least 1000 shuffles
  • More for high-dimensional datasets

Step 5 — Examine Feature Loadings

Loadings define which features drive each gcPC.

loadings = model.loadings_

Recommended:

Sort by magnitude:

np.argsort(np.abs(loadings[:, gcpc_index]))[::-1]

Interpret:

  • Large magnitude features drive the contrast
  • Sign indicates direction

Visualization options:

  • Heatmaps
  • Sorted bar plots
  • Gene ranking
  • Neuron weight plots

Step 6 — gcPCA Scatter Plots

Project data:

plt.scatter(model.Ra_scores_[:,0], model.Ra_scores_[:,1])
plt.scatter(model.Rb_scores_[:,0], model.Rb_scores_[:,1])

This reveals:

  • Dataset separation
  • Continuous gradients
  • Sub-structure

Multiply by magnitude if needed:

Ra_proj = model.Ra_scores_ * model.Ra_values_
Rb_proj = model.Rb_scores_ * model.Rb_values_

Sparse gcPCA Interpretation

Sparse gcPCA produces:

  • Feature-selective gcPCs
  • Easier interpretation
  • Reduced dimensionality

However:

Choosing sparsity requires balancing:

  • Interpretability
  • Structure preservation

Common strategies:

  • Compare multiple sparsity levels
  • Evaluate projection stability
  • Inspect reconstruction structure
  • Monitor objective values

There is no single optimal sparsity level.


Common Pitfalls

1. Interpreting Scores as Magnitude

Scores are normalized.

Use:

  • Ra_values_
  • Rb_values_

to recover magnitude.


2. Using Variance Alone

High variance in Ra does not imply Ra-enrichment.

Always interpret using:

  • objective values

3. Ignoring Bottom gcPCs

Bottom gcPCs:

  • Often contain meaningful structure
  • Represent Rb-enriched dimensions

Always inspect:

  • Top AND bottom gcPCs

4. Using Too Many gcPCs

Later gcPCs often:

  • Capture noise
  • Reduce interpretability

Use:

  • Scree plot
  • Permutation testing

5. Over-interpreting Small Objective Values

Values near zero:

  • Shared structure
  • Often not contrastive

These typically should not be prioritized.


Interpretation Examples

Neural Data

Top gcPCs may reflect:

  • Replay structure
  • Task-specific firing patterns
  • State-dependent covariance

Bottom gcPCs may reflect:

  • Baseline activity
  • Rest-specific structure

Gene Expression

Top gcPCs may reflect:

  • Differential gene modules
  • Cell-state-specific covariance
  • Disease-specific programs

Bottom gcPCs may reflect:

  • Control-specific structure
  • Baseline expression modules

Interpretation for Other gcPCA Variants

gcPCA v2

Objective:

[ R_a / R_b ]

Range:

[ [0, Inf) ]

 >1 → Ra enriched
 <1 → Rb enriched

gcPCA v3

Objective:

[ (R_a - R_b)/R_b ]

Range:

[ [-1, Inf) ]

 >0 → Ra enriched
 <0 → Rb enriched

Full interpretation differences are described in the manuscript.


Summary

Recommended interpretation workflow:

  1. Plot objective values
  2. Identify top and bottom gcPCs
  3. Inspect variance plots
  4. Use scree plot
  5. Run permutation testing
  6. Examine loadings
  7. Visualize projections
  8. Perform downstream analysis

gcPCA objective values should always be the primary interpretation metric.

Links to Other Pages

1. Quickstart Guide
2. Installation
3. Conceptual Overview
4. Mathematical Formulation
5. Code Reference
6. Input Data Guidelines