AMHGo Model And Umbrella Sampling - adavtyan/awsemmd GitHub Wiki

Introduction

This page introduces the use of AMH-Go, a non-additve structure based model included in the AWSEM simulation package, and the use of umbrella sampling over a collective coordinate for the purposes of performing free energy profile and expectation value calculations. AMH-Go is a non-additive structure based model, which is described in detail in: Eastwood, Michael P., and Peter G. Wolynes. "Role of explicitly cooperative interactions in protein folding funnels: a simulation study." The Journal of Chemical Physics 114 (2001): 4702.

Example: running an AMH-Go simulation

Obtain a structure of your protein of interest from the PDB (www.pdb.org) or elsewhere.

The structure should have coordinates for at least the backbone atoms and C-beta for all residues that you are interested in simulating. For the purposes of illustration, we will use Ubiquitin (pdb id: 1UBQ). Download the pdb file to a local directory. Use some tools, or edit the pdb file using a text editor, so that it only includes the chain that you want to simulate. In this case we will be simulating chain A, the only chain in this particular pdb file. If you are following along in this tutorial, rename this modified file to 1ubqA.pdb.

Generate the input files and copy the necessary parameter files to the current working directory.

Next, run the AWSEM script to start generating the input files:

PdbCoords2Lammps.sh 1ubqA 1ubqA

The first argument is the name of the pdb file that you generated and modified but without the ".pdb" extension. The second argument will serve as the pre/postfix for the input files that are generated by the script. In this case, we have chosen to use the same name for both, but this is not necessary. Note that many of the scripts, if run with no arguments, will show the basic syntax, such as:

$ PdbCoords2Lammps.sh

> /home/ns24/opt/script/PdbCoords2Lammps.sh pdb_id project_name

The files generated include coord, data, seq and in files. The coord file is a temporary file used only for constructing the data file and may be safely deleted. The in file is the LAMMPS input script. The seq file contains the one letter amino acid sequence, and the data file contains generic information about the number of atoms and bonds in the simulation, the number of atom and bond types, the dimensions of the simulation box, the masses of the atoms, the atom IDS, chain IDs, residue IDs, atom types, atom charges and x, y and z coordinates of the atoms as well as bond coefficients and a list of bonds between atoms.

Next, copy amh-go.gamma and fix_backbone_coeff.data from the parameters directory to your local directory. Edit the fix_backbone_coeff.data file so that it contains only the following blocks, and make sure that none of the blocks have a "-" after the closing bracket, as this will inactivate the corresponding potential:

[Chain]
10.0 10.0 30.0
2.45798 2.50665 2.44973

[Chi]
20.0 -0.83

[Epsilon]
1.0

[Rama]
2.0
5
 1.3149  15.398 0.15   1.74 0.65 -2.138
1.32016 49.0521 0.25  1.265 0.45  0.318
 1.0264 49.0954 0.65 -1.041 0.25  -0.78
    2.0   419.0  1.0  0.995  1.0  0.820
    2.0  15.398  1.0   2.25  1.0  -2.16

[Rama_P]
3
 0.0    0.0 1.0   0.0  1.0   0.0
2.17 105.52 1.0 1.153 0.15  -2.4
2.15 109.09 1.0  0.95 0.15 0.218
 0.0    0.0 1.0   0.0  2.0   0.0
 0.0    0.0 1.0   0.0  2.0   0.0

[SSWeight]
0 0 0 1 1 0
0 0 0 0 0 0

[ABC]
0.483 0.703 -0.186
0.444 0.235 0.321
0.841 0.893 -0.734

[AMH-Go]
1.0
2.5
8.0

In the [AMH-Go] block, the first parameter is the scaling of this term relative to the other terms. The second parameter is the "p-value", or non-additivity exponent. Higher p-values give high barriers and more cooperative folding. Values in the range of 2-3 have been found to give realistic levels of cooperativity. The third and final parameter is the cutoff for defining native contacts between residues, and is in units of Angstroms. For the purposes of this tutorial, we will use the values p=2.5 and a cutoff of 8 angstroms.

The amh-go.gamma file contains information about the strength of interactions between residues of different identities and sequence separation classes. The default file is consistent with those parameters used in the original study where the model was introduced and subsequent studies. The model is a "vanilla" model is the sense that the strength of interaction between residues does not depend on their identity. This can be easily changed by modifying the amhgo-gamma file, but that is beyond the immediate scope of this tutorial.

We now need to generate the amh-go.gro file, which will be used to determine which residues are in contact in the native state. Make sure that you renumber the PDB file that you will be using to generate the amh-go.gro file such that the first residue index is 1. Then, the file can be generated using:

python /path/to/awsemmd/tools/frag_mem_tools/Pdb2Gro.py 1ubqA amh-go.gro

where, again, this assumes that you have a file called 1ubqA.pdb and creates a file callled amh-go.gro. The gro file contains the number of atoms, residue IDs, residue types, atom types, atom IDs, and x, y and z coordinates of the atoms.

Finally, it is necesary to generate an "ssweight" file, which contains information about how to bias the secondary structure using the Ramachandran potentials. Since we don't want to provide any extra bias to the native structure in this case, we can just use a file containing 76 (the number of residues in 1UBQ chain A) of "0.0 0.0", meaning that the extra bias to an alpha or beta conformation are both zero for all residues.

$ for i in {1..76}; do echo "0.0 0.0"; done > ssweight

Run a simulation.

You are now ready to run your first simulation with the AMH-Go model. By issuing the command

$ lmp_serial < 1ubqA.in

you should see output that looks something like

LAMMPS (9 Oct 2012)
Scanning data file ...
  3 = max bonds/atom
Reading data file ...
  orthogonal box = (-10000 -10000 -10000) to (10000 10000 10000)
  1 by 1 by 1 MPI processor grid
  228 atoms
  302 bonds
Finding 1-2 1-3 1-4 neighbors ...
  5 = max # of 1-2 neighbors
  5 = max # of special neighbors
76 atoms in group alpha_carbons
76 atoms in group beta_atoms
76 atoms in group oxygens
Chain flag on
Chi flag on
Rama flag on
Rama_P flag on
SSWeight flag on
ABC flag on
AMH-Go flag on

Setting up run ...
Memory usage per processor = 0.891418 Mbytes
Step Temp E_pair E_mol TotEng Press Volume 
       0          300    69.685911    2804.7907    2656.0812    -19470.21    37318.056 
    1000     309.6291    3.2541839    154.84377    -57.57167    377.89729    37318.056 
    2000    317.92045    4.5022927    145.15843   -46.009096    -513.3088    34061.896 
    3000    289.04461    4.2000368    136.29229    -80.36671   -53.054637    34061.896 
    4000    284.42523    3.7390817    122.59595   -81.326705   -701.07713    32570.479 
    5000    338.64666    4.4512755    122.41484   -38.536759     533.1893     35143.81 
    6000    284.88008    1.6497933    105.21846   -101.70821    708.35035    37609.352 
    7000    309.85035    2.2742725    107.32425   -71.231702      319.249    36253.327 
    8000    288.93468    4.5076069    114.54767   -69.546086     144.3742    37828.582 
    9000    318.63316    3.7868783    101.90289   -60.801079   -147.73095    37141.499 
   10000    317.62052    2.9372909    106.25488   -54.292209    172.91441    36277.457 
Loop time of 5.3855 on 1 procs for 10000 steps with 228 atoms

Pair  time (%) = 0.740053 (13.7416)
Bond  time (%) = 0.0497167 (0.923159)
Neigh time (%) = 0.00521493 (0.0968328)
Comm  time (%) = 0.00129962 (0.0241318)
Outpt time (%) = 0.00178814 (0.0332029)
Other time (%) = 4.58743 (85.1811)

Nlocal:    228 ave 228 max 228 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:    0 ave 0 max 0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:    10891 ave 10891 max 10891 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:  21782 ave 21782 max 21782 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 21782
Ave neighs/atom = 95.5351
Ave special neighs/atom = 2.64912
Neighbor list builds = 10
Dangerous builds = 0

Example: applying a collective coordinate bias

Add a fix qbias command to your input file.

In order to sample different parts of the landscape, it is often useful to apply biases to your simulation Hamiltonian. In this case, we will apply a bias to the Q coordinate. Low values of Q correspond to unfolded structures, while high values of Q correspond to folded, near-native structures. To apply a bias to your molecule, you can add code like this to your input file:

# apply a qbias
fix	      qbias alpha_carbons qbias fix_qbias_coeff.data # apply a Q-biasing potential as specified in fix_qbias_coeff.data
fix_modify    qbias energy no # do not add the energy from the Q-biasing potential to the total potential energy
variable      biasinge equal f_qbias

See the LAMMPS manual for general information about how "fixes" work. In this case, the first line applies a fix with the label "qbias" to the "alpha_carbons" group, which is of type "qbias" and takes as input the file "fix_qbias_coeff.data". The qbias fix assumes that you have a file named "rnative.dat" in the current directory which contains a square matrix of native CA-CA distances. Qbias fix uses that matrix to calculate the instantaneous Q value of the simulated structure, the biasing energy, and the biasing forces at every timestep. The procedure for generating rnative.dat is described below. The second line, starting with "fix_modify", prevents the potential energy coming from the bias from being automatically added to the total potential energy for the purposes of output. Whether or not you want to automatically include the biasing energy into the total potential energy depends on how you will be subtracting out the bias later. A complete explanation of this is beyond the scope of this tutorial, but most WHAM-like algorithms explicitly calculate the biasing energy internally and add it to the unbiased total potential energy, so including it here would be tantamount to double-counting. The last line creates a variable so that the biasing energy can be output separately if so desired.

The fix_qbias_coeff.data file has a format similar to that of fix_backbone_coeff.data discussed previously. An example fix_qbias_coeff.data file looks like:

[QBias_Exp]
KQBIAS
Q0
2
0.15

In this case KQBIAS should be replaced by a floating point number that dictates the strength of the biasing potential. For models with larger barriers it is necessary to apply larger biasing potentials, but applying too larger of a biasing potentials will reduce the overlap between adjacent sampling "umbrellas" and thereby impact the quality of the unbiasing calculation. In general it is necessary to check whether or not your sampling is sufficient by (at least) checking for significant overlap between sampling umbrellas. Values of KQBIAS between 100 and 1000 are appropriate for different models/proteins/purposes when using AWSEM. The Q0 parameter should replaced by a number between 0 and 1 and specifies the center of the biasing potential. The next two lines indicate that the biasing potential is harmonic (an exponent of 2) and that the width of the gaussian well used for biasing increases with increasing sequence separation according to |i-j|^0.15.

Compute collective coordinates on the fly using compute functions

AWSEM and LAMMPS can be used to compute some collective coordinates on-the-fly during the simulation using LAMMPS' compute functionality. An example block of input code that computes several collective coordinates including a Q value, radius of gyration, contact Q, the number of total contacts, and the total potential energy, is given below (replace SAMPLETIME with how often you want to output to the wham.dat file, e.g., 1000):

# output wham.dat, a file with collective variables that can be used to compute pmfs
compute     qw alpha_carbons qwolynes rnative.dat 2 0.15
variable    qw equal c_qw
compute     rg alpha_carbons gyration
variable    rg equal c_rg
compute     qo alpha_carbons qonuchic cutoff 12.0 nativecoords.dat 1.2
variable    qo equal c_qo
compute     tc beta_atoms totalcontacts 6.5 2
variable    tc equal c_tc
variable    step equal step
variable    E_P equal pe

fix         wham all print SAMPLETIME "${step}        ${qw} ${rg} ${qo} ${tc} ${E_P}" file wham.dat screen no title "# timestep qw           rg          qo            tc            energy"

In order to use these compute functions, it is necessary to prepare the rnative.dat and nativecoords.dat files. This can be done using:

python /path/to/awsemmd/tools/create_project_tools/GetCACADistancesFile.py 1ubqA rnative.dat
python /path/to/awsemmd/tools/create_project_tools/GetCACoordinatesFromPDB.py 1ubqA nativecoords.dat

Note:Use GetCACADistanceFile_multiChain.py for multi-chain simulation. The result of running a simulation with code like that above in the input file should be a file called wham.dat in the directory where the simulation was run.

Setup multiple directories, Run all simulations, Collect the data

Once these extra items have been added to the input file, you are ready to run umbrella sampling by specifying an array of Q0 values over a set of directories. All of these simulations can be run in parallel, and once the simulations have finished you will want to collect the wham.dat files, and perhaps other files such as energy.log, in order to compute 1D and 2D free energy profiles and expectation value plots. Performing these calculations is beyond the scope of this tutorial, but you may find the following tools to be useful: WHAM.jar pyMBAR Grossfield WHAM