Adding Evolutionary Restraints to AWSEM (AWSEM ER) - adavtyan/awsemmd GitHub Wiki
Introduction
This guide will walk you through adding evolutionary restraints to your AWSEM simulation. This guide assumes you know the basics of running simulations with AWSEM already. The current implementation uses the AMH-Go term to add the restraints. The idea is to add evolutionarily inferred contacts from a method like DCA, GREMLIN, or RaptorX to the simulation. The guide is broken down into three parts:
- Obtain predicted contacts from a server such as GREMLIN or RaptorX.
- Convert the predicted contacts to go_rnative*.dat files for AWSEM to read in.
- Ensure the proper flags are turned on in the fix_backbone_coeff.data file.
Note: The quality of the GREMLIN/RaptorX contact prediction is heavily influenced by the number of homologous sequences available for the target protein sequence. GREMLIN outputs the number of sequences used for the calculation on the results page, whereas RaptorX provides an alignment of all the sequences used in the calculation. A typical guideline for the number of sequences necessary for an accurate prediction is 5L, where L is the length of the target protein in amino acids. More sequences results in a more accurate prediction.
Predicting contacts using GREMLIN or RaptorX servers
You can choose to predict contacts from either of the following servers.
http://openseq.org/submit.php
GREMLIN:- Submit the target sequence in the "Starting alignment" box, use default settings initially. If the protein has few sequences you can also try Generating the MSA using Jackhmmer instead of HHblits.
- Wait for the GREMLIN analysis to finish. This typically takes under an hour but it depends on the protein size, the number of sequences, and the number of other jobs running.
- Once the GREMLIN analysis is complete, download the contact predictions. The contact predictions are obtained via the "Text file of contact predictions" link. This file will serve as input for the GremlinToGo_rnative.py script.
http://raptorx6.uchicago.edu/ContactMap/
RaptorX:- Submit the target sequence in the "Sequence" box and your email, you don't need to select any of the other options.
- Wait for the RaptorX analysis to finish. This typically takes under an hour but it depends on the protein size, the number of sequences, and the number of other jobs running.
- Once the RaptorX analysis is complete, download the contact predictions (contactmap.txt) from the email. This file will serve as input for the RaptorXToGo_rnative.py script.
Converting the predicted contacts to go_rnative*.dat files
In order to use the predicted contacts in an AWSEM simulation the files need to be converted to the proper format. For a typical AMH-Go simulation, the software calculates the distance between each ij pair of residues based on a gro file of the structure. In the case of structure prediction, we don't have a gro file of the structure to use. To overcome this, the AMH-Go term reads in go_rnative files when the evolutionary restraint flag is on. A go_rnative file contains an NxN matrix of distances based on the predicted contacts. If a pair of amino acids is predicted as a contact, the distance is set to a distance based on the types of interacting amino acids (somewhere between 4 and 10 Angstroms). If a pair of amino acids are not in contact, the distance is arbitrarily set to 99 Angstroms, which is far above the cutoff distance we set (12.0 Angstroms). The cutoff distance in this case isn't particularly meaningful when the evolutionary restraint flag is activated.
Scripts to generate go_rnative files from predicted contacts are provided in the create_project_tools folder. If the contacts come from the GREMLIN server, the GremlinToGo_rnative.py script should be used. For the contacts predicted via RaptorX contact prediction, use the RaptorXToGo_rnative.py script.
Script usage
python RaptorXToGo_rnative.py predicted_contacts_file [probability_cutoff]
python GremlinToGo_rnative.py predicted_contacts_file sequence_file [probability_cutoff]
note: These scripts currently rely on the CACAmediandist.dat, CACBmediandist.dat, and CBCBmediandist.dat files. These files are also provided in the create_project_tools folder. Make sure these files are in the $PATH.
The scripts will output three files to the current directory: go_rnativeCACA.dat, go_rnative_CACB.dat, go_rnative_CBCB.dat. These files have slightly different distances for interactions between C-alpha C-alpha interactions, C-aplha C-beta interactions, and C-beta C-beta interactions respectively. Each of these files is necessary for the evolutionary restraint term. The go_rnative*dat files should be copied to the simulation directory.
Activating the evolutionary restraint term in AWSEM
To run simulations with evolutionary restraints the AMH-Go term of AWSEM must be on, with the frustration_censoring_flag set to 2. This flag will result in the software reading in the go_rnative*.dat files instead of calculating the distances based on a gro file. When the frustration_censoring_flag set to 2, the software automatically looks for files named go_rnativeCACA.dat, go_rnativeCACB.dat, and go_rnativeCBCB.dat.
For reference, example fix_backbone_coeff_CoEv-contact.data and fix_backbone_coeff_AWSEM-ER.data files are given in the awsemmd/parameters/ directory. Note that the strength of the contact term ([Water]) is decreased when the evolutionary restraint flag was activated.