Monomer Structure Prediction - adavtyan/awsemmd GitHub Wiki

Introduction

This page introduces the use of the AWSEM simulation package, and the use of simulated annealing simulation for the purposes of predicting structure of monomer. The designed protein Top7 (PDB ID: 1QYS) was chosen to be the target for this example.

EXAMPLE: PREDICTING STRUCTURE OF MONOMER

Files needed from Protein Data Bank

• Fasta sequence (ID.fasta.text from PDB), rename it to ID.fasta

• PDB ﬁle (ID.pdb)

It is important to check and make necessary edits to the sequence in the Fasta ﬁle so that it contains only the portion of the sequence that have coordinates in the PDB ﬁle.

For example: the ﬁrst 2 residues and the last 12 residues in the fasta ﬁle of Top7 (PDB ID: 1QYS) were cut off to match with the sequence in the PDB.

Secondary bias - ssweight ﬁle

Generate ssweight ﬁle using JPRED prediction tool

Go to JPRED homepage: http://www.compbio.dundee.ac.uk/www-jpred/
Feed the sequence from fasta ﬁle into JPRED. Choose to continue carrying out a Jpred prediction. When the prediction is complete, look at the prediction in “ViewSimple”
Copy the JPRED prediction into a new text ﬁle, called IDjpred
Call command to generate ssweight ﬁle

python /awsemmd/tools/create_project_tools/GenSswight.py IDjpred ssweight

Generate ssweight ﬁle using STRIDE server

• Access the Stride Web interface: http://webclu.bio.wzw.tum.de/stride/

• Output the result (in plain text format) and save as ssweight.stride

• Issue command to generate ssweight ﬁle from STRIDE assignment:

python /awsemmd/tools/create_project_tools/stride2ssweight.py > ssweight

Files needed for computing qw and qo

rnative.dat can be generated by the following command:

python /awsemmd/tools/create_project_tools/GetCACADistancesFile.py ID rnative.dat

nativecoords.dat can be generated by the following command:

python /awsemmd/tools/create_project_tools/GetCACoordinatesFromPDB.py ID nativecoords.dat

Fragment memory library generation tool

Obtaining and Preparing Protein Database

• You can obtain your own database of structures with desired resolution and maximum mutual sequence identity by using the PISCES Protein Sequence Culling Server (http://dunbrack.fccc.edu/PISCES.php). The server will give you a FASTA ﬁle as output.

• To generate the Fragment Library you need a database of well deﬁned structures in BLASTable format and a FASTA ﬁle which contains the sequences of those structures. The FASTA ﬁle should have the same preﬁx as the database. If you already have a FASTA ﬁle you can convert it to BLASTable format using makeblastdb executable.

makeblastdb -in database-prefix.fasta -out database-prefix -dbtype prot

Output: database-preﬁx.phr; database-preﬁx.pin and database-preﬁx.psq

Generatng Fragment Library

You now can run the following script to generate fragment library for a single-chain simulation.

python /awsemmd/tools/frag_mem_tools/prepFragsLAMW_index.py database-prefix ID.fasta 20 1/0

Where 20 is typically the desirable number of memories per position. The last number represents the option of homolog excluded (1), and homolog allowed (0). Homolog excluded is used for de novo structure prediction, in which all sequence homologs will be excluded from the search.

The above script will give you /frablib/ directory; fragsLAMW.mem as outputs. fragsLAMW.mem ﬁle contains {Memories} section with one line description of memories found. The coordinate ﬁles (with .gro extensions) are also generated by the scripts and can be found in ./fraglib/ directory.

[Memories]
./fraglib/2q3xa.gro 1 1462 6 1
./fraglib/3s3ea.gro 2 169 7 1
./fraglib/3l48a.gro 2 783 8 1
./fraglib/1q3oa.gro 1 646 9 1

When running homolog excluded simulations, turn on Memory or Memory Table in ﬁx backbone coeff.data; the later one uses precomputed tables for routine energy and force computations and is much faster compared to Memory.

Generating project

Run the following script to generate the data ﬁle, sequence ﬁle, and input ﬁle.

PdbCoords2Lammps.sh ID project_name

It will give you three ﬁles as outputs: data.project name; project name.seq; project name.in.

Preparing template directory

Bellow are list of ﬁles that will be needed to run monomer structure prediction:

• anti HB; anti NHB; anti one; para HB; para one; gamma.dat; burial gamma.dat; uniform gamma; ﬁx backbone coeff.data. These ﬁles can be obtained from the /parameters/ directory of AWSEM source code.

• ssweight

• rnative.dat, nativecoords.dat

• fragsLAMW.mem with correct paths link to gro ﬁle in /fraglib/

• data.project name; project name.seq, project name.in

Run simulated annealing simulation for monomer structure prediction

• Run a short equilibration at a constant temperature above the folding temperature (Tf) to unfold the protein into a long extended chain.

• Run a long simulation (typically 10 milions 2fs time steps) while slowly bringing the temperature from above Tf to below Tf.