Protein‐Lipid Contact Time Series - k-ngo/CATMD GitHub Wiki

Protein–Lipid Contact Time Series

Overview and Methodology

What It Does

This script detects and quantifies contacts between protein residues and lipid atoms over the course of a molecular dynamics (MD) simulation. It outputs a time series contact matrix where each row corresponds to a protein residue and each column represents a frame in the trajectory. A contact is registered when any atom of a protein residue is within a user-defined cutoff distance from any lipid atom.

How It Works

  • Objective: Identify residues that are in contact with lipids over time and map out these interactions as a binary matrix.
  • Process:
    • For each trajectory frame:
      • Select protein residues and lipids that are spatially close based on a distance cutoff.
      • Optionally limit contact detection to specific lipid atoms (e.g., headgroups).
      • Calculate distance matrices between protein and lipid atoms.
      • Assign a contact (1) if any atom in a residue falls within the cutoff distance.
    • Parallel processing is available for speed-up across frames.
    • The resulting data is a residue-by-time contact matrix, which can be used for further analysis or visualization.

Configuration and Inputs

Prerequisites

  • Requires a loaded trajectory.

Key Configuration Options

  • Selections:

    • protein_sel: Atom selection string for the protein (e.g., 'segid VSD').
    • protein_name: Label used in the output file naming and title.
    • lipid_sel: Atom selection string for lipids (e.g., 'segid MEMB').
    • lipid_name: Label used in output and title.
    • lipid_atom: Optional filter for lipid atoms to include in distance computation (e.g., 'P').
  • Contact Parameters:

    • contact_cutoff: Distance threshold (Å) to define a contact between protein and lipid atoms.
  • Timing and Trajectory Control:

    • time_total: Duration of the trajectory in physical units (used for x-axis labeling).
    • begin_frame, end_frame, step: Frame range and sampling interval.
    • show_progress: Enables progress updates during processing.
  • Parallelization:

    • num_threads: Number of CPU threads for multiprocessing. Set -1 to use all available cores.
  • Filtering and Output Control:

    • ts_min_fraction: Minimum fraction of trajectory time a residue must contact lipids to be included in the output.
    • Frame-by-frame residue labels are dynamically assigned based on the active selection and are auto-matched.

Output

  • DataFrame:

    • Rows: Protein residue labels in the format X123:SEGID, where X is the one-letter residue code.
    • Columns: Time points corresponding to trajectory frames.
    • Values: 1 if contact occurs, 0 otherwise.
  • Saved Files:

    • CSV: Binary contact matrix saved to figures/{protein_name}_{lipid_name}_Contact_Time_Series.csv.
    • PNG: (Optional if plotting is included) Heatmap figure saved to figures/{protein_name}_{lipid_name}_Contact_Time_Series.png.
  • Console Output:

    • Frame-by-frame debug summaries including:
      • Number of lipid residues near protein.
      • Count of contacting residues.
      • z-axis ranges of protein and lipid atoms.
      • Error messages for frames with missing atoms or contacts.

Interpreting the Results

  • Contact Matrix:

    • Each 1 indicates that a residue was in contact with lipids during a specific frame.
    • Absence of contacts (all 0s for a residue) means it was either too distant or excluded due to filtering.
    • Residues present in fewer frames than ts_min_fraction are omitted to improve clarity.
  • Debug Info:

    • Useful for diagnosing issues such as missing selections, incorrect atom names, or overly strict cutoffs.
    • z-axis range information can be used to assess membrane embedding depth or drift during simulation.

Example Scenarios

Membrane-Binding Domain Analysis

  • Observation: Peripheral residues show sporadic contact with the lipid bilayer.
  • Interpretation: Suggests transient membrane binding or surface association.

Transmembrane Helix Stability

  • Observation: Stable contacts across all frames for central helices.
  • Interpretation: Indicative of consistent embedding within the membrane core.

Drug-Induced Reorganization

  • Observation: Change in contact pattern after specific simulation time point.
  • Interpretation: Potential effect of bound compound altering protein-lipid interaction network.

Usage Tips

  • Atom Filtering:

    • Use lipid_atom='P' or similar to restrict contact detection to lipid headgroups for biologically relevant interactions.
  • Performance:

    • Increase num_threads for long trajectories to speed up analysis.
    • Lower ts_min_fraction to retain more residues, but at the cost of visual clarity.
  • Edge Cases:

    • No detected contacts may indicate overly strict cutoff or incorrect selections.
    • Use debug outputs to verify spatial proximity and atom selections.