Fragment Memory - adavtyan/awsemmd GitHub Wiki

Introduction

AWSEM employs experimentally available knowledge of resolved structures to bias local in sequence interactions in a protein. One particular way to do this is to use short (with length of 9 or shorter) overlapping fragments called memories. Those memories can be chosen based on sequence similarity or any other desirable criterion such as existence of known homologues.

The corresponding energy term is called Fragment Memory. To include the Fragment Memory term in your simulations you will first need to generate a Fragment Library for each protein of interest.

This document explains how to generate a Fragment Library based on the sequence alignment method using python tools located in frag_mem_tools directory, and how to use the Fragment Memory term.

Generating Fragment Library

What do you need

Scripts from awsemmd/tools/frag_mem_tools/ directory
Sequence of interest in FASTA format. Can contain multiple sequences
Protein database in BLASTable format and the conjugate FASTA file (see bellow)
psiblast executable (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/)
wget package (http://www.gnu.org/software/wget/)

Obtaining and preparing protein database

To generate Fragment Library you need a database of well defined structures in BLASTable format and a FASTA file which contains the sequences of those structures. The FASTA file should have the same prefix as the database. If you already have a FASTA file you can convert it to BLASTable format using makeblastdb executable, which comes with psiblast.

makeblastdb -in database_name.fasta -out database_name -dbtype prot

This should generate 3 files; database_name.phr, database_name.pin and database_name.psq.

To obtain your own database of structures with desirable resolution and maximum mutual sequence identity you can use PISCES Protein Sequence Culling Server (http://dunbrack.fccc.edu/PISCES.php). This server allows to cull sequences from the entire PDB. Here is the list of parameters you will need to input.

Maximum percentage identity - choose according to your problem, recommend value 80
Minimum resolution - should be 0.0
Maximum resolution - typically 2.0 or 3.0
Maximum R-value - typically 0.25
Minimum chain length - typically 40
Maximum chain length - typically 10000
Skip non-X-ray entries? - must answer Yes
Skip CA-only entries? - must answer Yes
How do you want to cull PDB? - must answer By chains

You can also download their pre-compiled lists and fasta files from the website.

Running scripts

Now when you have everything necessary you can run either prepFragsLAMW_index.py or MultCha_prepFrags_index.py scripts to generate the Fragment Library for the target sequence. The later one should be used for multi-chain simulations. In single-chain case it is more optimal to use the first script. They both take the same arguments and should be called in the following way.

python prepFragsLAMW_index.py database-prefix target_sequence.fasta n_mem homologs_excluded_flag (using '1' or '0')

where n_mem sets the desirable number of memories per position (typically 20). If homologs_excluded flag is 1, all sequence homologs will be excluded from the search.

Running simulations

The scripts mentioned above will generate a fragsLAMW.mem file which contains [Memories] section with a one line description of memories found.

[Memories]
./fraglib/3c3pa.gro 1 78 6 1
./fraglib/2vsoe.gro 2 832 8 1
./fraglib/3rota.gro 2 194 7 1
./fraglib/2hn1a.gro 1 151 8 1
./fraglib/1knza.gro 1 41 9 1
./fraglib/3hr6a.gro 3 349 7 1
.............................

Each line indicates

a coordinate file name of the sequence aligned towards the fragment
memory position in the sequence of interest
memory position in the aligned sequence
memory fragment length
fragment weight

The coordinate files (with .gro extensions) are also generated by the scripts and can be found in ./fraglib/ directory.

To include Fragment Memory term in your simulations

copy the fraglib directory and the fragsLAMW.mem file (you can rename it if you want) into your simulation directory
turn on (see Getting Started with AWSEM) Fragment_Memory or Fragment_Memory_Table term in fix_backbone_coeff.data
substitute the name of the mem file
make sure you have the desired gamma file and a right strength for the term (the first parameter in the section, typically 0.02 in case of 20 memories per position)

The difference between [Fragment_Memory] and [Fragment_Memory_Table] is that the latter one uses precomputed tables for routine energy and force computations and is much faster compared to [Fragment_Memory]. The last line under [Fragment_Memory_Table] sets the table range (r0, rn) and grid step (dr).

Lastly, you will need to adjust the neighbor skin distance in the *.in file (see here for reference). The Fragment Memory potential will typically have a larger cutoff than the rest of the terms. Thus, setting the neighbor 24 bin in the input file will be necessary for memories of length 9.