Identify Major Interaction Events or Conformational Deviations - k-ngo/CATMD GitHub Wiki
Identify Major Interaction Events or Conformational Deviations
Overview and Methodology
What It Does
This script automatically identifies frames in a simulation trajectory that deviate significantly from the rest, revealing rare events such as ligand unbinding, large conformational changes, unfolding, or simulation artifacts.
How It Works
- Objective: Score each simulation frame for its structural similarity (or dissimilarity) to the rest of the trajectory using machine learning models.
- Process:
- Feature Extraction: Extracts structural data such as CA positions, pairwise distances, or dihedral angles.
- Model Application: Uses either Isolation Forest or One-Class SVM to detect frames that deviate from the learned distribution.
- Anomaly Scoring:
- Isolation Forest: Higher scores mean frames are easier to isolate — more anomalous.
- One-Class SVM: Positive score = likely anomalous (after inversion), negative = normal.
- Visualization: A time series plot marks anomalous frames, and the top anomalies are rendered in 3D.
Configuration and Inputs
Prerequisites
- Requires a loaded trajectory.
Key Configuration Options
-
Selections:
group_sel
,group_name
: Atom selection and label (e.g.,protein
,Protein
).
-
Features:
use_ca_positions
: Use C-alpha atom positions.use_pairwise_distances
: Use inter-residue center of mass distances (statistically reduced to mean, std, percentiles to reduce memory footprint).use_dihedrals
: Use phi/psi backbone torsions.
-
Detection Methods:
Isolation Forest
: Tree-based model for efficient anomaly detection.One-Class SVM
: Boundary-based model, good for subtle outliers.contamination
: Expected fraction of anomalies.
Outputs
- Anomaly Timeline:
*_Anomaly_Timeline.png
— Time series plot of anomaly scores with colored bands marking outliers. - Anomaly List:
*_Anomalies.csv
— Table of anomalous frames with corresponding time and score. - 3D Visuals: 3D structure previews of the top anomalous frames rendered inline using py3Dmol.
- Logs: Anomaly count, score range, and detection summaries printed in the terminal.
Interpreting the Results
Anomaly Score Plot
- X-axis: Simulation time (in user-defined units).
- Y-axis: Model-derived anomaly score (higher = more unusual).
- Red Bands: Highlight anomalous frames that deviate from the structural norm.
CSV Output
- Allows further inspection or filtering of anomalous frames.
- Useful for downstream clustering, trajectory trimming, or snapshot extraction.
3D Visualizations
- Offers direct inspection of frames with the highest anomaly scores.
- Useful for detecting simulation errors or rare but biologically relevant events.
Example Scenarios
Ligand Unbinding Detection
- Scenario: Ligand leaves the binding pocket at some point.
- Observation: A sharp spike in anomaly score followed by a sustained deviation.
- Interpretation: Detected as an outlier due to major rearrangement or loss of interaction contacts.
Transient Loop Opening
- Scenario: A loop briefly unfolds and re-folds.
- Observation: A single or few isolated anomalies appear.
- Interpretation: Suggests reversible fluctuations that may be functionally important.
Simulation Instability or Glitch
- Scenario: A frame contains abnormally large or distorted geometry.
- Observation: An extreme anomaly score, possibly with visible distortion in 3D.
- Interpretation: May indicate simulation artifact, poor equilibration, or input error.
Usage Tips
-
Feature Balance:
- Start with
use_ca_positions=True
. - Add
use_pairwise_distances
for more resolution, especially in small systems. - Use
use_dihedrals
for secondary structure transitions or flexible loops.
- Start with
-
Contamination Tuning:
- Use higher
contamination
(~0.05) for noisy trajectories or exploratory scans. - Use lower (~0.01) for cleaner simulations to focus on major deviations.
- Use higher