Identify Major Interaction Events or Conformational Deviations - k-ngo/CATMD GitHub Wiki

Identify Major Interaction Events or Conformational Deviations

Overview and Methodology

What It Does

This script automatically identifies frames in a simulation trajectory that deviate significantly from the rest, revealing rare events such as ligand unbinding, large conformational changes, unfolding, or simulation artifacts.

How It Works

  • Objective: Score each simulation frame for its structural similarity (or dissimilarity) to the rest of the trajectory using machine learning models.
  • Process:
    • Feature Extraction: Extracts structural data such as CA positions, pairwise distances, or dihedral angles.
    • Model Application: Uses either Isolation Forest or One-Class SVM to detect frames that deviate from the learned distribution.
    • Anomaly Scoring:
      • Isolation Forest: Higher scores mean frames are easier to isolate — more anomalous.
      • One-Class SVM: Positive score = likely anomalous (after inversion), negative = normal.
    • Visualization: A time series plot marks anomalous frames, and the top anomalies are rendered in 3D.

Configuration and Inputs

Prerequisites

  • Requires a loaded trajectory.

Key Configuration Options

  • Selections:

    • group_sel, group_name: Atom selection and label (e.g., protein, Protein).
  • Features:

    • use_ca_positions: Use C-alpha atom positions.
    • use_pairwise_distances: Use inter-residue center of mass distances (statistically reduced to mean, std, percentiles to reduce memory footprint).
    • use_dihedrals: Use phi/psi backbone torsions.
  • Detection Methods:

    • Isolation Forest: Tree-based model for efficient anomaly detection.
    • One-Class SVM: Boundary-based model, good for subtle outliers.
    • contamination: Expected fraction of anomalies.

Outputs

  • Anomaly Timeline: *_Anomaly_Timeline.png — Time series plot of anomaly scores with colored bands marking outliers.
  • Anomaly List: *_Anomalies.csv — Table of anomalous frames with corresponding time and score.
  • 3D Visuals: 3D structure previews of the top anomalous frames rendered inline using py3Dmol.
  • Logs: Anomaly count, score range, and detection summaries printed in the terminal.

Interpreting the Results

Anomaly Score Plot

  • X-axis: Simulation time (in user-defined units).
  • Y-axis: Model-derived anomaly score (higher = more unusual).
  • Red Bands: Highlight anomalous frames that deviate from the structural norm.

CSV Output

  • Allows further inspection or filtering of anomalous frames.
  • Useful for downstream clustering, trajectory trimming, or snapshot extraction.

3D Visualizations

  • Offers direct inspection of frames with the highest anomaly scores.
  • Useful for detecting simulation errors or rare but biologically relevant events.

Example Scenarios

Ligand Unbinding Detection

  • Scenario: Ligand leaves the binding pocket at some point.
  • Observation: A sharp spike in anomaly score followed by a sustained deviation.
  • Interpretation: Detected as an outlier due to major rearrangement or loss of interaction contacts.

Transient Loop Opening

  • Scenario: A loop briefly unfolds and re-folds.
  • Observation: A single or few isolated anomalies appear.
  • Interpretation: Suggests reversible fluctuations that may be functionally important.

Simulation Instability or Glitch

  • Scenario: A frame contains abnormally large or distorted geometry.
  • Observation: An extreme anomaly score, possibly with visible distortion in 3D.
  • Interpretation: May indicate simulation artifact, poor equilibration, or input error.

Usage Tips

  • Feature Balance:

    • Start with use_ca_positions=True.
    • Add use_pairwise_distances for more resolution, especially in small systems.
    • Use use_dihedrals for secondary structure transitions or flexible loops.
  • Contamination Tuning:

    • Use higher contamination (~0.05) for noisy trajectories or exploratory scans.
    • Use lower (~0.01) for cleaner simulations to focus on major deviations.