Analyze and Visualize Interaction Pair Co‐occurrence - k-ngo/CATMD GitHub Wiki

Analyze and Visualize Interaction Pair Co‐occurrence

Overview and Methodology

What It Does

This tool analyzes how frequently pairs of interactions (e.g., H-bond A–B and Salt Bridge C–D) co-occur in the same simulation frames. It visualizes these pairwise co-occurrence frequencies as a matrix-style percentage heatmap.

How It Works

  • Objective: Identify patterns of correlated interactions by quantifying simultaneous presence across trajectory frames.
  • Process:
    • Loads interaction presence matrices generated during the ▶️ (Run First) Extract Pairwise Interaction from Trajectory step.
    • Filters out interactions below a frequency threshold (e.g., present in <40% of frames).
    • Computes pairwise co-occurrence percentages between all interactions.
    • Visualizes results using a symmetric heatmap and a colorbar.

Configuration and Inputs

Prerequisites

  • Requires a loaded trajectory.
  • Requires interaction CSVs generated using the same sel1_name and sel2_name from the extraction step.

Key Configuration Options

  • Selection Labels:

    • sel1_name, sel2_name: Labels used to locate and match interaction data files.
  • Interaction Mode:

    • interaction_mode: Scope of interaction analysis.
      • interchain + intrachain: All interactions.
      • interchain: Between different chains.
      • intrachain: Within the same chain.
  • Interaction Type Filter:

    • interaction_types: Specify which interaction types to include in the analysis. Choose from:
      • All, H-Bonds, Salt Bridges, Hydrophobic, π-Stacking, Cation–π.
  • Contact Filtering:

    • pct_hide_threshold: Only include interactions present in at least this percentage of frames.
  • Parallel Processing:

    • num_threads: Number of CPU threads to use for co-occurrence calculation. Use -1 to use all available threads.

Output

  • Co-occurrence Heatmap:

    • Matrix where each cell shows the percentage of frames in which both interactions in a pair are present.
    • Diagonal cells are always set to 100%.
    • Interactions are labeled by residue pairs (e.g., ARG123–GLU45).
  • Colorbar:

    • Gradient scale from 0% to 100% co-occurrence.
    • Independent from the main heatmap for modular layout.
  • Saved Files:

    • interaction_co_occurrence_heatmap_<sel1>_<sel2>.png: Heatmap image.
    • interaction_co_occurrence_<sel1>_<sel2>.csv: Raw co-occurrence matrix (in %).
  • Console Output:

    • Lists excluded interactions below the threshold.
    • Shows progress during multiprocessing steps.
    • Prints legend when all interaction types are included.

Interpreting the Results

  • High Co-occurrence:

    • High values (close to 100%) indicate strong synchronization or cooperative presence of two interactions.
  • Low Co-occurrence:

    • Low percentages suggest mutually exclusive, transient, or unrelated interactions.
  • Thresholding Effect:

    • The pct_hide_threshold setting directly controls the number of interactions visualized. Raising it filters out noise; lowering it increases detail.
  • Sidechain Indicator:

    • Residue labels with * indicate sidechain-originating interactions (inherited from extraction data).

Example Scenarios

Ligand Binding Fingerprint Mapping

  • Scenario: Determine if certain ligand–residue contacts frequently appear together.
  • Selections:
    • sel1 = 'protein'
    • sel2 = 'resname LIG'
  • Observation: Persistent co-occurrence of ligand-sidechain contacts suggests cooperative binding pocket regions.

Allosteric Pathway Detection

  • Scenario: Explore whether contacts in different protein regions fluctuate together.
  • Selections:
    • sel1 = 'resid 10-50'
    • sel2 = 'resid 150-190'
  • Observation: Non-adjacent contacts with high co-occurrence may indicate long-range coupling.

Usage Tips

  • For Full Interaction Scope:

    • Set interaction_types = 'All' to compute comprehensive co-occurrence across interaction classes.
  • Legend Readability:

    • When showing all interaction types, refer to the printed legend for abbreviation meanings.
  • Matrix Size Handling:

    • Increase fig_width_factor or fig_height_factor to avoid overcrowded axes for large datasets.

⚠️ **GitHub.com Fallback** ⚠️