Monomer Structure Prediction - adavtyan/awsemmd GitHub Wiki

Introduction

This page introduces the use of the AWSEM simulation package, and the use of simulated annealing simulation for the purposes of predicting structure of monomer. The designed protein Top7 (PDB ID: 1QYS) was chosen to be the target for this example.

EXAMPLE: PREDICTING STRUCTURE OF MONOMER

Files needed from Protein Data Bank

• Fasta sequence (ID.fasta.text from PDB), rename it to ID.fasta

• PDB file (ID.pdb)

It is important to check and make necessary edits to the sequence in the Fasta file so that it contains only the portion of the sequence that have coordinates in the PDB file.

For example: the first 2 residues and the last 12 residues in the fasta file of Top7 (PDB ID: 1QYS) were cut off to match with the sequence in the PDB.

Secondary bias - ssweight file

Generate ssweight file using JPRED prediction tool

  1. Go to JPRED homepage: http://www.compbio.dundee.ac.uk/www-jpred/

  2. Feed the sequence from fasta file into JPRED. Choose to continue carrying out a Jpred prediction. When the prediction is complete, look at the prediction in “ViewSimple”

  3. Copy the JPRED prediction into a new text file, called IDjpred

  4. Call command to generate ssweight file

python /awsemmd/tools/create_project_tools/GenSswight.py IDjpred ssweight

Generate ssweight file using STRIDE server

• Access the Stride Web interface: http://webclu.bio.wzw.tum.de/stride/

• Output the result (in plain text format) and save as ssweight.stride

• Issue command to generate ssweight file from STRIDE assignment:

python /awsemmd/tools/create_project_tools/stride2ssweight.py > ssweight

Files needed for computing qw and qo

rnative.dat can be generated by the following command:

python /awsemmd/tools/create_project_tools/GetCACADistancesFile.py ID rnative.dat

nativecoords.dat can be generated by the following command:

python /awsemmd/tools/create_project_tools/GetCACoordinatesFromPDB.py ID nativecoords.dat

Fragment memory library generation tool

Obtaining and Preparing Protein Database

• You can obtain your own database of structures with desired resolution and maximum mutual sequence identity by using the PISCES Protein Sequence Culling Server (http://dunbrack.fccc.edu/PISCES.php). The server will give you a FASTA file as output.

• To generate the Fragment Library you need a database of well defined structures in BLASTable format and a FASTA file which contains the sequences of those structures. The FASTA file should have the same prefix as the database. If you already have a FASTA file you can convert it to BLASTable format using makeblastdb executable.

makeblastdb -in database-prefix.fasta -out database-prefix -dbtype prot

Output: database-prefix.phr; database-prefix.pin and database-prefix.psq

Generatng Fragment Library

You now can run the following script to generate fragment library for a single-chain simulation.

python /awsemmd/tools/frag_mem_tools/prepFragsLAMW_index.py database-prefix ID.fasta 20 1/0

Where 20 is typically the desirable number of memories per position. The last number represents the option of homolog excluded (1), and homolog allowed (0). Homolog excluded is used for de novo structure prediction, in which all sequence homologs will be excluded from the search.

The above script will give you /frablib/ directory; fragsLAMW.mem as outputs. fragsLAMW.mem file contains {Memories} section with one line description of memories found. The coordinate files (with .gro extensions) are also generated by the scripts and can be found in ./fraglib/ directory.

[Memories]
./fraglib/2q3xa.gro 1 1462 6 1
./fraglib/3s3ea.gro 2 169 7 1
./fraglib/3l48a.gro 2 783 8 1
./fraglib/1q3oa.gro 1 646 9 1

When running homolog excluded simulations, turn on Memory or Memory Table in fix backbone coeff.data; the later one uses precomputed tables for routine energy and force computations and is much faster compared to Memory.

Generating project

Run the following script to generate the data file, sequence file, and input file.

PdbCoords2Lammps.sh ID project_name

It will give you three files as outputs: data.project name; project name.seq; project name.in.

Preparing template directory

Bellow are list of files that will be needed to run monomer structure prediction:

• anti HB; anti NHB; anti one; para HB; para one; gamma.dat; burial gamma.dat; uniform gamma; fix backbone coeff.data. These files can be obtained from the /parameters/ directory of AWSEM source code.

• ssweight

• rnative.dat, nativecoords.dat

• fragsLAMW.mem with correct paths link to gro file in /fraglib/

• data.project name; project name.seq, project name.in

Run simulated annealing simulation for monomer structure prediction

• Run a short equilibration at a constant temperature above the folding temperature (Tf) to unfold the protein into a long extended chain.

• Run a long simulation (typically 10 milions 2fs time steps) while slowly bringing the temperature from above Tf to below Tf.