Rosetta implementation - m3g/xlff GitHub Wiki
Implementing statistical potential into Rosetta
Have a question? Some suggestion? Contact: [email protected]
Referece:
A. Ferrari, F.C.Gozzo, L.Martínez, Statistical potential for structural modeling using chemical cross-linking/mass spectrometry distance constraints, Bioinformatics, 2019.
https://doi.org/10.1093/bioinformatics/btz013
How to
The statistical potential for cross-linking / mass spectrometry derived constraints is implement in Rosetta numerically. This is done by taking advantage on a constraint function called in Rosetta ETABLE. ETABLE describes any function as a table of values which can be interpolated in order to approximate the function true behavior (See General explanation below).
Although ETABLE func is a default Rosetta's scoring function, it needs to be enabled. To do so, two files need to be modified.
- Go to the scoring fuctions directory
$ cd ~rosetta_path/main/source/src/core/scoring/func/
- copy to this directory the two files provided here in code directory
$ cp ~/files/code/FuncFactory.cc ~/files/code/EtableFunc.cc ~rosetta_path/main/source/src/scoring/func/
This will replace both file in the directory. If you want to keep default files, it is recommended to rename both of them first.
The FuncFactory.cc file just add to the default file two lines specifying that EtableFunc.cc should be considered in Rosetta applications. EtableFunc.cc brings the definitions of f(x) and its derivative, which is not present in the default file, that is, it is user-defined.
After copy those two files, it is necesary to recompile Rosetta.
- Go to
$ cd ~rosetta_path/main/source
and run
$./scons.py [parameters]
to do it.
If no error message pops up, you are ready to run a job using the statistical potential.
Next section explains the general format of a constraint file. A file with the options used to run SalBIII protein structure prediction is provided in
$~/files/example/flags
General explanation
The general input format in a constraint file applying ETABLE func is:
AtomPair [Atom1] [ResID1] [Atom2] [ResID2] ETABLE [min] [max] [many numbers]
In this format each "[ ]" refers to a user-defined variable.
For example:
[Atom1] = CB
[ResID1] = 1 or 2, etc
[min] = is the minimum valeu of x for which [many number] has been computed
[max] = is the maximum value of x for which [many numbers] has been computed
[many numbers] = values of func for x from [min] to [max] spaced out by 0.1
Some common links have their statistical potential curves defined. Each [many numbers] file to each of the residues pairs can be found in $/files/xl
Also, a script to create a constraint file for Rosetta application is avaiable as "xl_generator.py" in $file/xl. Use:
$ python2.7 xl_generator.py $input_filename [yes/no]
Provide a file ($input_filename) in the required format. Example:
observed LYS A 123 SER A 14 short
observed SER A 15 ASP A 24 zl
observed GLU A 17 ASP A 44 long
Type yes if you have used shorter links (BSG / 1,3-propaneamine). Type no if only BS3/DSS / 1,6-hexanediamine were used. If only BS3/DSS and / or 1,6-hexanediamine were used, column 8 can be omitted in the input file.
[min] and [max] are defined based on the statistics of CATH S40 non redundant database. Their values are tabulated as follows:
Linker_name Link_Type L[XL] [min] [max]
DSS/BS3 KK 11.5 3.0 17.8
DSS/BS3 KS 11.5 3.0 15.8
DSS/BS3 SS 11.5 3.0 13.4
1,6-diaminehexane EE 11.5 3.0 15.1
1,6-diaminehexane DE 11.5 3.0 14.3
1,6-diaminehexane DD 11.5 3.0 13.5
DSG KK 7.7 3.0 15.2
DSG KS 7.7 3.0 12.4
DSG SS 7.7 3.0 10.0
1,3-diaminepropane EE 7.7 3.0 11.6
1,3-diaminepropane DE 7.7 3.0 10.7
1,3-diaminepropane DD 7.7 3.0 9.8
zero-length KD 0.0 3.0 9.7
zero-length KE 0.0 3.0 10.5
zero-length SD 0.0 3.0 7.0
zero-length SE 0.0 3.0 7.7
Additional comments
-
L[XL] stands for the spacer arm lenght of the cross-linker. For example, L[XL] = 11.5 A refers to DSS or 1,6-hexanediamine, L[XL] = 7.7 A refers to DSG or 1,3-propanediamine and L[XL] = 0 A refers to zero lenght species.
-
f(x) is computed in the interval between [min] and [max] as an approximation of the true function by the closest value of the Euclidean distance measured. Above x = [max] a linear penalization is computed, that is, f(x) = (x - xmax_ ) if x > x_max. The derivative is computed as the angular coeficient of the linear curve described by the interpolation of the two closest tabulated values from Euclidean distance measured, x.