Protein‐Lipid Contact Time Series - k-ngo/CATMD GitHub Wiki
Protein–Lipid Contact Time Series
Overview and Methodology
What It Does
This script detects and quantifies contacts between protein residues and lipid atoms over the course of a molecular dynamics (MD) simulation. It outputs a time series contact matrix where each row corresponds to a protein residue and each column represents a frame in the trajectory. A contact is registered when any atom of a protein residue is within a user-defined cutoff distance from any lipid atom.
How It Works
- Objective: Identify residues that are in contact with lipids over time and map out these interactions as a binary matrix.
- Process:
- For each trajectory frame:
- Select protein residues and lipids that are spatially close based on a distance cutoff.
- Optionally limit contact detection to specific lipid atoms (e.g., headgroups).
- Calculate distance matrices between protein and lipid atoms.
- Assign a contact (1) if any atom in a residue falls within the cutoff distance.
- Parallel processing is available for speed-up across frames.
- The resulting data is a residue-by-time contact matrix, which can be used for further analysis or visualization.
- For each trajectory frame:
Configuration and Inputs
Prerequisites
- Requires a loaded trajectory.
Key Configuration Options
-
Selections:
protein_sel
: Atom selection string for the protein (e.g.,'segid VSD'
).protein_name
: Label used in the output file naming and title.lipid_sel
: Atom selection string for lipids (e.g.,'segid MEMB'
).lipid_name
: Label used in output and title.lipid_atom
: Optional filter for lipid atoms to include in distance computation (e.g.,'P'
).
-
Contact Parameters:
contact_cutoff
: Distance threshold (Å) to define a contact between protein and lipid atoms.
-
Timing and Trajectory Control:
time_total
: Duration of the trajectory in physical units (used for x-axis labeling).begin_frame
,end_frame
,step
: Frame range and sampling interval.show_progress
: Enables progress updates during processing.
-
Parallelization:
num_threads
: Number of CPU threads for multiprocessing. Set-1
to use all available cores.
-
Filtering and Output Control:
ts_min_fraction
: Minimum fraction of trajectory time a residue must contact lipids to be included in the output.- Frame-by-frame residue labels are dynamically assigned based on the active selection and are auto-matched.
Output
-
DataFrame:
- Rows: Protein residue labels in the format
X123:SEGID
, whereX
is the one-letter residue code. - Columns: Time points corresponding to trajectory frames.
- Values:
1
if contact occurs,0
otherwise.
- Rows: Protein residue labels in the format
-
Saved Files:
- CSV: Binary contact matrix saved to
figures/{protein_name}_{lipid_name}_Contact_Time_Series.csv
. - PNG: (Optional if plotting is included) Heatmap figure saved to
figures/{protein_name}_{lipid_name}_Contact_Time_Series.png
.
- CSV: Binary contact matrix saved to
-
Console Output:
- Frame-by-frame debug summaries including:
- Number of lipid residues near protein.
- Count of contacting residues.
- z-axis ranges of protein and lipid atoms.
- Error messages for frames with missing atoms or contacts.
- Frame-by-frame debug summaries including:
Interpreting the Results
-
Contact Matrix:
- Each
1
indicates that a residue was in contact with lipids during a specific frame. - Absence of contacts (all
0
s for a residue) means it was either too distant or excluded due to filtering. - Residues present in fewer frames than
ts_min_fraction
are omitted to improve clarity.
- Each
-
Debug Info:
- Useful for diagnosing issues such as missing selections, incorrect atom names, or overly strict cutoffs.
- z-axis range information can be used to assess membrane embedding depth or drift during simulation.
Example Scenarios
Membrane-Binding Domain Analysis
- Observation: Peripheral residues show sporadic contact with the lipid bilayer.
- Interpretation: Suggests transient membrane binding or surface association.
Transmembrane Helix Stability
- Observation: Stable contacts across all frames for central helices.
- Interpretation: Indicative of consistent embedding within the membrane core.
Drug-Induced Reorganization
- Observation: Change in contact pattern after specific simulation time point.
- Interpretation: Potential effect of bound compound altering protein-lipid interaction network.
Usage Tips
-
Atom Filtering:
- Use
lipid_atom='P'
or similar to restrict contact detection to lipid headgroups for biologically relevant interactions.
- Use
-
Performance:
- Increase
num_threads
for long trajectories to speed up analysis. - Lower
ts_min_fraction
to retain more residues, but at the cost of visual clarity.
- Increase
-
Edge Cases:
- No detected contacts may indicate overly strict cutoff or incorrect selections.
- Use debug outputs to verify spatial proximity and atom selections.