Molecular Dynamics - dkoes/docs GitHub Wiki

A good overview of molecular dynamics in general: https://www.livecomsjournal.org/article/5957-best-practices-for-foundations-in-molecular-simulations-article-v1-0

Amber

Our group typically uses Amber for performing molecular dynamics. Amber describes both a set of force fields and a suite of MD programs. The Amber website provides extensive documentation of program usage, including tutorials for many common tasks, manuals for major releases of Amber and AmberTools (these are the authoritative resource for using the software), and an active mailing list for debugging and support. Although we use a shared script for automating the process of setting up and running Amber molecular dynamics (prepareamber.py, described below), it is still recommened that you read at least the introductory Amber documentation before getting started so that you understand what you are doing, and that you refer to it for debugging purposes when you encounter problems.

prepareamber.py

We maintain a shared script, prepareamber.py, to automate the process of setting up and running Amber MDs as much as possible.

Installation

First clone the git repository.

git clone https://github.com/dkoes/md-scripts.git

If you did this in your home directory you will be able to run prepareamber.py by typing ~/md-scripts/prepareamber.py (the ~ is a shorthand for your home directory).

You will need to install prepareamber.py's dependencies in a conda environment.

module load anaconda
conda config --add channels conda-forge
conda create -n "amber"
conda activate amber
conda install openmm openff-toolkit plumbum cython xmltodict typing_extensions

On the cluster, setup your amber environment with

eval "$(conda shell.bash hook)"  # this is needed in a slurm script
conda activate amber
module load amber/24

Running the script

prepareamber.py --help will give you a list of options (and if it fails, it will let you know anything you missed in the previous step, which you should go back and address). The most basic invocation just provides the script one or more structures you want to simulate, i.e. prepareamber.py -s rec.pdb LIG.sdf. All the structures you provide will be simulated together, so if you want to perform separate simulations of a set of ligands with a particular protein, those will require separate invocations of the script.

Additionally, note that separate structures are provided to the script as separate files (so if you retrieve a protein-ligand complex from the PDB, you will need to separate the protein and ligand into different files before invoking the script). The reason this decision was made is that one of the major things the script does is parametrize small molecules for which it does not already have parameters, but proteins may themselves have residues for which parameters may be missing, and it would be incorrect to use the same process for parameter assignment for a protein as for a small molecule. Therefore the script will decline to parameterize any undefined residues in a structure it has identified as a protein, and will instead report an error with the name of the undefined residue.

Finally, it's important to pay attention to protonation states. By default, the script strips out any hydrogens that are present and then adds them back to small molecules with OpenBabel and to proteins with Leap. Pass -noh if you don't want it to touch your hydrogens (e.g. if you have a ligand crystal structure from the PDB and are confident in its protonation state), but keep in mind that because of Amber naming conventions for hydrogens in proteins, in most cases you really do want the protein's hydrogens to be removed before processing. In that case you can strip out the protein hydrogens before beginning with OpenBabel or the pdb4amber program provided with Amber, and pass the prepared protein with ligand structures containing the desired hydrogens to prepareamber.py with the -noh option.

Troubleshooting prepareamber.py

ImportError: Check that obabel is on your path
ImportError: Check that AMBER binaries are on your path
ImportError: Check match_atomname (one of the antechamber-related AmberTools packages) has been built - starting with AmberTools18 it is not built by default, but can be built manually from $AMBERHOME/AmberTools/src/antechamber/
Antechamber failed. Check XXX structure

ImportError: Check that obabel is on your path

Find the obabel install location. If you cannot find it, install it. Then add the directory it's in to your PATH if it isn't there already. On the cluster add /net/pulsar/home/koes/dkoes/local/bin to your PATH environment variable. (Back)

ImportError: Check that AMBER binaries are on your path

Find the AMBER install location. On the cluster, look in /usr/local, and on a group-owned workstation, try /home/dkoes/build. Set AMBERHOME=/path/to/amber, source $AMBERHOME/amber.sh, and echo "source $AMBERHOME/amber.sh" >> ~/.bashrc. Confirm that which pmemd.cuda returns the executable in $AMBERHOME/bin. (Back)

ImportError: Check match_atomname has been built

See this thread for more information. Patch things up by doing the following: cd $AMBERHOME/AmberTools/src/antechamber. Then make match_atomname and cp match_atomname $AMBERHOME/bin. If your environment variables have been properly set up, which match_atomname should now return that executable. On the cluster add /net/pulsar/home/koes/dkoes/local/bin to your PATH environment variable (Back)

Antechamber failed. Check XXX structure.

Antechamber is the Amber program that is used to parametrize small molecules. It can fail for various reasons.

First, verify that the file it failed with actually contains a small molecule - if it's a peptide, a nonstandard residue that's part of a protein, a nucleic acid, etc., it probably shouldn't be parametrized with antechamber in the first place. If it's a peptide, check the initial file to see why it isn't being correctly identified as one (for example, sometimes the residue names are missing). If it's anything other than a normal small molecule or a normal protein, you may need to get parameters from somewhere, and provide them to the script via the --libs argument. If you're simulating a structure that our group has simulated before, check on Slack - somebody should have the parameters you need. A good first place to check for missing parameters is The University of Manchester's AMBER parameter database. Querying the literature is also an option. In some cases, doing an ab initio calculation with Gaussian (available on the cluster) may be required.
If you are working with a small molecule and really want to parametrize it with antechamber, a common problem is making sure it's been protonated correctly and the correct net charge has been provided to antechamber. By default, the script strips out hydrogens present in the input, and adds them back to the ligand with OpenBabel. It attempts to calculate a net charge to pass to antechamber by calculating Gasteiger charges, and if that fails, it attempts to rerun antechamber with common net charge values. If you are confident that your starting hydrogens are correct, you should run the script with -noh (with the caveat that you might want to strip out the protein hydrogens first if you are simulating a complex, e.g. with pdb4amber). You can also specify a desired net charge by passing --net_charge to prepareamber.
Always sanity check your structures. If antechamber fails, visualize the structure it failed with and make sure things look normal. This is especially true if all of the above fails, but as a rule you should always check your data, and that's exactly what your structure is. Another common problem is incorrect bonding or geometry, which can be introduced as the structure is processed by different pieces of software. For example, some software infers bonds based on distance, and may incorrectly introduce a bond when the ligand's geometry is strained. It's often easier to find these kinds of problems quickly by looking at the structure in PyMol than by staring at the file in your text editor.
(Back)

Post-processing trajectories

In order to analyze trajectories, they need to be processed so that the protein is aligned and water is wrapped into the periodic box. Additionally, we typically downsample to every 10th frame. By convention, we store post-processed trajectories as dcd files. In order to process a trajectory with suffix foo, provide the following input to cpptraj:

parm foo.prmtop
trajin foo_md3.nc 1 last 10
autoimage
rms first @CA
trajout foo.dcd dcd nobox

If you are running cpptraj interactively you will have to also type run. Do not call run multiple times in a single session. It will apply all the commands you've typed during the session, including those before the previous run you typed. If you want to efficiently process multiple trajectories, you will want to create a script like this:

#!/bin/bash

#convert prepare_amber nc to dcd, takes basename
#prints out commands to send to cpptraj

echo "parm ${1}.prmtop"
echo "trajin ${1}_md3.nc 1 last 10"
echo "autoimage"
echo "rms first @CA"
echo "trajout ${1}.dcd dcd nobox"

which you can then run with

ptrajit foo | cpptraj

For a much reduced file size you can strip waters, but this requires an updated topology too (stripped.prmtop will be created):

parm foo.prmtop
trajin foo_md3.nc 1 last 10
autoimage
strip :WAT outprefix stripped
rms first @CA
trajout stripped.dcd dcd nobox

Running on the Cluster

Full production simulations (md3) should be run on the cluster. See https://github.com/dkoes/docs/wiki/Using-the-Cluster----GPU