AWSEM_IDP - adavtyan/awsemmd GitHub Wiki

Introduction

This wiki page introduces the installation and usage of AWSEM-IDP, a new branch of AWSEM for simulating intrinsically disordered proteins (IDP) with higher efficiency and accuracy. Compared to the standard AWSEM, AWSEM-IDP has three major improvements: lower weights for secondary structure terms, fragment memory library from experiments and/or all-atom simulations, and a novel radius of gyration (Rg) potential to control the global size of protein chain. For detailed description of AWSEM-IDP, please refer to the the following paper and its supporting information:

Hao Wu, Peter G. Wolynes, and Garegin A. Papoian, "AWSEM-IDP: A Coarse-Grained Force Field for Intrinsically Disordered Proteins", The Journal of Physical Chemistry B, 2018, https://pubs.acs.org/doi/10.1021/acs.jpcb.8b05791

Installation

Note that in the three new features of AWSEM-IDP, only the Rg potential is implemented in the installation stage. The other two features should be implemented as external parameters and files when generating a simulation project. As a result, the installation of AWSEM-IDP is exactly the same as standard AWSEM with serial version of LAMMPS. Please follow the home wiki for detailed installation guide. The Rg potential related code (fix_spring_rg_papoian.cpp, fix_spring_rg_papoian.h) are already merged to the AWSEM master branch as a more universal tool than just serving IDP simulations.

AWSEM-IDP simulation pipeline

The general pipeline of running AWSEM-IDP simulation is similar to the standard AWSEM. All the three AWSEM-IDP new features are subject to be used with caution, depending on the specific IDP to be studied. Here the IDP PaaA2 (PDB ID: 3ZBE) is taken as an example. All files mentioned below can be found in trunk/examples/paaa2_IDP_example.

Prepare general input files

We use one structure (5AAA-1.pdb) selected from NMR ensemble as the initial PDB structure. Similar to the standard AWSEM, we create a simulation template directory, go to this directory and run the following command to generate the data file, sequence file and input file:
```
bash PdbCoords2Lammps.sh 5AAA-1 paaa2
```
This command should create data.paaa2, paaa2.in, paaa2.coord, paaa2.seq. Note the .coord file is not needed for further simulation.
Then we copy all the parameter files from trunk/parameters to this directory, including anti_HB, anti_NHB, anti_one, burial_gamma.dat, fix_backbone_coeff.data, gamma.dat, para_HB, para_one, uniform.gamma.

Implement AWSEM-IDP new features

Adjust secondary structure terms weight

The default weight for the helix term (the first parameter in the [helix] section in fix_backbone_coeff.data) is 1.5. This value is too large for some IDPs and will lead to artificially high helical propensity. Therefore, in AWSEM-IDP it is reduced to 1.2. This new weight has been benchmarked with experiments/all-atom simulations for PaaA2 and H4 histone tail.
In IDP simulations, the SSbias term should be turned off because they assign artificially stable secondary structures for certain regions. Hence, we recommend still leaving [SSWeight] on and generating a ssweight file, but changing all the two columns to 0.0. Please refer to SSBias wiki for detailed procedure.
For some IDPs, it is also possible to further tweak the weights for the beta hydrogen bonding terms, namely [Dssp_Hdrgn] and [P_AP] in fix_backbone_coeff.data. But these parameter tunings should be benchmarked by reliable experimental or all-atom simulation data, and is out of the scope of current wiki page.

Generate fragment memory library

Unlike the standard AWSEM, AWSEM-IDP relies on experimental structural ensemble and/or all-atom simulation of the same IDP to build fragment memory library for higher accuracy. A repository for IDP experimental and simulated structures can be found in pE-DB database:

NMR data are normally good sources for experimental structural ensembles. In current example PaaA2, we use 50 NMR structure ensemble from Sterckx, Structure 2014 (pE-DB ID: 5AAA, PDB ID: 3ZBE) as fragment memory.
All-atom simulations with explicit solvent can also be reliable fragment memory source. One should select appropriate force field to guarantee the accuracy of IDP simulation. a99SB-disp from DE Shaw Research or Charmm36m from Alex MacKerell lab should suffice. Effective enhanced-sampling methods, such as replica-exchange or parallel continuous simulated tempering, should also be used for efficient sampling. Please refer to Wu, JPCB, 2018; Lin, JMB 2018; Chen, JPCB 2017; Chen, JPCB 2016; for detailed description of using all-atom simulation for AWSEM.
After obtaining proper fragment memory structures, we convert all these .pdb structures to .gro file (GROMACS format) using awsemmd/tools/frag_mem_tools/Pdb2Gro.py:
```
for i in `seq 1 50`;
do
    python Pdb2Gro.py 5AAA-$i 5AAA-$i.gro
done    
```
Then create a ./fraglib directory and put all the created .gro files in it.
Prepare a .mem file containing all the information of fragment memory. Please refer to the FragmentMemory wiki for detail description. An example paaa2.mem file is shown below:
```
[Target]
paaa2

[Memories]
./fraglib/paaa2_1.gro 1 1 71 1
./fraglib/paaa2_2.gro 1 1 71 1
...
./fraglib/paaa2_50.gro 1 1 71 1
```
Here we use all the 50 structures from NMR experiments and assign equal weights = 1 for each structure for simplicity. One can also consider conducting clustering analysis for structure ensemble and calculate proper weight for each structure, as done in AAWSEM.
Edit the [Fragment_Memory_Table] section in fix_backbone_coeff.data. Note that the weight of fragment memory term is not normalized, i.e. one should adjust this weight based on the number of fragment memories included. For 50 memories used here, we set weight = 0.002. A sample [Fragment_Memory_Table] is shown below:
```
[Fragment_Memory_Table]
0.002
paaa2.mem
uniform.gamma
0 50 0.1
1.0
0
0.15
```

Use the Rg potential

The Rg potential is implemented as a new fix style, namely fix_spring_rg_papoian in LAMMPS. It is modified from the fix_spring_rg style in LAMMPS. This fix should be added to the .in file, in parallel with fix_backbone. Below is an example to apply the Rg potential to all the alpha_carbon atoms in PaaA2:

# New fix type: fix spring/rg/papoian: fix radius of gyration with Papoian potential
# Input: ID group-ID spring/rg/papoian N rg0 gamma D alpha beta
# N: residue number
# rg0: rg value at bottom of well
# gamma: rg0 correction factor
# D: well depth
# alpha: width between two maximum
# beta: well width
# fix vrg alpha_carbons spring/rg/papoian vrg_n vrg_rg0 vrg_gamma vrg_d vrg_alpha vrg_beta
fix vrg alpha_carbons spring/rg/papoian 71 23 1.0 -0.2 0.001 0.003

The detailed definitions and suggested ranges of Rg potential parameters can be found in the AWSEM-IDP paper. In general, N = residue number, Rg0 = average Rg from experiments or all-atom simulation. The rest parameters are subject to change. The above example can serve as a good starting point.
It is also recommended to compute and output the Rg value during simulations:
```
compute rg alpha_carbons gyration
thermo_style custom step temp press vol emol epair ke pe etotal c_rg
```
By doing this we can track whether the Rg potential works as we want. If not, then the parameters should be changed.

Run AWSEM-IDP simulation

Here we elaborate a general pipeline for simulating a single chain IDP structure ensemble at constant temperature.

After preparing all the files mentioned above, we can start to run an unfolding simulation at high temperature to create a random initial configuration:
```
lmp_serial -in unfold.in -log unfold.log
```
Note that in this unfolding simulation we do not need use fragment memory or Rg potential - the final snapshot obtained in this step purely serves as a random chain. Please refer to unfold.in and fix_backbone_coeff_unfold.data for details.
Then we anneal the system from previous high temperature to the desired temperature for our study (300K in this example):
```
lmp_serial -in annealing.in -log annealing.log
```
From this step we need to add fragment memory and Rg potential to guide the IDP simulation.
After annealing we run a short constant-temperature simulation as equilibration:
```
lmp_serial -in equil.in -log equil.log
```

Finally we conduct the production run:

lmp_serial -in constT.in -log constT.log

The above procedures are normally repeated with different initial velocity distribution. The final ensemble consists of snapshots from all the production runs.

Besides IDP-specific parameters, all the general setup and parameters in this example (timestep, neighbor list, thermostat, etc.) are also subject to change, depending on the purpose of study and the particular IDP simulated.

Limitation

Current version of AWSEM-IDP can only simulate pure IDP system (may contain multiple chains). It does not work on hybrid systems consisting of IDP and ordered proteins.
Current version of AWSEM-IDP is not compatible with most of other AWSEM branches, including AWSEM-membrane, AWSEM-electrostatic, AWSEM-DNA, etc. It should be only used within the standard AWSEM scope.
Like the standard AWSEM and any other branches, currently AWSEM-IDP can only run with serial version of LAMMPS. Trying to compile and run AWSEM-IDP with parallel LAMMPS (i.e. lmp_mpi) will result in failure in most cases, and does not guarantee reasonable speedup.