Protein DNA Simulations - adavtyan/awsemmd GitHub Wiki

Introduction

Note: a newer version of the tools are available under tools/awsem_3spn2_HaoWu

AWSEM is a protein force field, originally designed for studying protein structure prediction, folding dynamics, binding interface prediction, folding in membranes etc. Since the coarse-grained DNA model (3SPN.2) has been made available as part of the LAMMPS MD package, we build a protein-DNA hybrid simulation platform by implementing both the AWSEM and the 3SPN.2 into the LAMMPS.

This tutorial will walk you through a step-by-step build of protein-DNA simulation. There are three steps: 1. Build DNA data file 2. Build protein data file 3. Merge them. You can download the tutorial package (fisDNA_example.tar.gz) from the example page: https://github.com/adavtyan/awsemmd/tree/master/examples.

Also, you will need to have python, pylab, and Ruby installed on your computer in order to complete the tutorial. For python-related packages, we recommend to install Anaconda https://www.continuum.io/downloads (choose Python 2.7 version), which offers a python scientific programming environment required for this tutorial, including numpy and scipy etc. In addition, Ruby needs to be installed independently https://www.ruby-lang.org/en/downloads/.

1. Build DNA data file

To build a 3SPN DNA model, DNA data file is created using both the 3SPN and X3DNA toolkits. Before we can actually build the model, we need to download these packages, unzip/untar them, and properly setup path to which your shell can have an access. This tutorial is tested and run on Linux under a Bash shell prompt $. First of all, let's move to the working folder.

$ cd buildDna/

where you can find x3dna-v2.2.tar.gz and USER-3SPN2.tar.gz already there for you. Now we extract (untar) these files using the follwoing commands

$ tar -zxvf x3dna-v2.2.tar.gz

and

$ tar -zxvf USER-3SPN2.tar.gz

which generate the folder x3dna-v2.2 and USER-3SPN2, respectively.

Second, we setup up correct path for using functions of x3dna. Our default shell here is a Bash shell. You can setup the path by entering two lines in the command line.

$ export X3DNA="YOUR_LOCAL_FOLDER/x3dna-v2.2"

$ export PATH="YOUR_LOCAL_FOLDER/x3dna-v2.2/bin:$PATH"

The former sets the internal shell variable X3DNA in order to validate the use of the x3dna functions, the latter allows users to have access to the local executables/scripts from the command line. It's highly recommended to add the above two lines in the .bashrc (or .profile etc.) file as a shell default setting, otherwise, the setting would be gone after you close the session (effectively as an one-time action).

Once it's done. We can proceed to look at the script file genConf.sh, which runs all the binary/scripts of x3dna and 3spn2 needed to build a DNA data file. You can create it with an arbitrary sequence specified in dnaSeq.txt.

$ cat dnaSeq.txt

54

AAATTTGTTTGAATTTTGAGCAAATTTAAATTTGTTTGAATTTTGAGCAAATTT

Make the shell script executable by typing

$ chmod +x genConf.sh

Now let's execute ./genConf.sh.

$ ./genConf.sh

The average twist in this sequence is 34.516981

Time used: 00:00:00:00

################################################################ Pair coefficients for 3SPN.2 representation of B-DNA pair_coeff 1 1 3spn2 0.239006 4.500000 pair_coeff 2 2 3spn2 0.239006 6.200000 pair_coeff 3 3 3spn2 0.239006 5.400000 pair_coeff 4 4 3spn2 0.239006 7.100000 pair_coeff 5 5 3spn2 0.239006 4.900000 pair_coeff 6 6 3spn2 0.239006 6.400000 pair_coeff 7 7 3spn2 0.239006 5.400000 pair_coeff 8 8 3spn2 0.239006 7.100000 pair_coeff 9 9 3spn2 0.239006 4.900000 pair_coeff 10 10 3spn2 0.239006 6.400000 pair_coeff 11 11 3spn2 0.239006 5.400000 pair_coeff 12 12 3spn2 0.239006 7.100000 pair_coeff 13 13 3spn2 0.239006 4.900000 pair_coeff 14 14 3spn2 0.239006 6.400000 ################################################################

That's it!

With the message printing out the screen, you now get a DNA data file, a set of DNA list files, as well as several intermediate files.

In principle, the DNA data file (bdna_curv_conf.in) and the set of DNA list files (in00_bond.list, in00_angl.list, and in00_dihe.list) in their present form are ready for setting up DNA simulation alone. However, to incoporate protein into the simulation, we need to tweak these files a bit. The script gen_multi_dna_awsem_v2.py can do the job; it reads the old data file and the list files and turn them into the format for merging with protein.

$ python gen_multi_dna_awsem_v2.py bdna_curv_conf.in in00_bond.list in00_angl.list in00_dihe.list 567 1 0 0 0 dna_premerge.data

Note that 567 refers to an offset for indexing DNA atoms in the list files.

The output would be

  1. dna_premerge.data

  2. new_bond.list 3. new_angl.list, and 4. new_dihe.list.

These files will be used later.

2. Build protein data file

Check AWSEM tutorial (https://github.com/adavtyan/awsemmd/wiki) for making protein data file

3. Merge

Once you have your protein and DNA data files ready (summarized below). Merging them into one data file is relatively easy!

Protein data file: data.fis

DNA date file: dna_premerge.data

$ cd ../merge

$ python merge.py ../buildProtein/data.fis ../buildDna/dna_premerge.data ../buildProtein/fis.seq

Finally, you will get data.merge where protein and DNA information is properly merged into one data file.