SCHEMA Energy - TUM-CBR/pymol-plugins GitHub Wiki

This page describes how to use the "SCEMA Energy" tool. Before using the tool, it is advised that you familiarize yourself with SCHEMA as this tool is merely a front-end to easily configure and run the SCHEMA analysis.

We describe how the tool works by way of example:

Step 1

SCHEMA needs a structure in order to identify contacts. It is possible to select any structure that has been loaded into Pymol. However, SCHEMA only supports a single chain for its analysis, so the selection is limited to that. We will use "1gxm" for this example which we can fetch with:

fetch 1gxm

After retrieving the structure we can click the "Refresh" button and select one of the chains. In this case we will use "1gxm/A/A":

Step 2

SCHEMA needs a list of parents that will be used to build your library. These parents can be provided in the fasta format by simply pasting them in the "Input Sequences (fasta)" text box in the application. The list below can be used for the demo:

>1GXM_A Family 10 polysaccharide lyase from Cellvibrio cellulosa [Cellvibrio japonicus]
GLVPRGSHMTGRMLTLDGNPAANWLNNARTKWSASRADVVLSYQQNNGGWPKNLDYNSVGNGGGGNESGTIDNGATITEM
VFLAEVYKSGGNTKYRDAVRKAANFLVNSQYSTGALPQFYPLKGGYSDHATFNDNGMAYALTVLDFAANKRAPFDTDVFS
DNDRTRFKTAVTKGTDYILKAQWKQNGVLTVWCAQHGALDYQPKKARAYELESLSGSESVGVLAFLMTQPQTAEIEQAVR
AGVAWFNSPRTYLEGYTYDSSLAATNPIVPRAGSKMWYRFYDLNTNRGFFSDRDGSKFYDITQMSLERRTGYSWGGNYGT
SIINFAQKVGYL
>AAW84085.1 pectate lyase [uncultured bacterium]
MAKILTLDGNPAASWFNKSRTKWNSSRADIVLSYQQSNGGWPKNLDYNSVSAGNGGSDSGTIDNGATITEMVYLAEIYKN
GGNTKYRDAVRRAANFLVSSQYSTGALPQFYPLKGGYHDHATFNDNGMAYALTVLDFAVNKRAPFDNDIFSDSDRAKFKT
AVAKGVDYILKAQWKQNGKLTAWCAQHGALDYQPKKGRAYELESLSGKESVGILAFLMTQPQTAQIEAAVKAGVNWFASP
NTYLANYTYDSSKASTNPIVYKKGSRMWYRFYDLYTNRGFFSDRDGSKFYDITQMSEERRTGYSWGGSWGEVIISFAQKV
GYL
>AAW84052.1 pectate lyase [uncultured bacterium]
MELPVTGAWATWQTATVEIDLVQGNNLLKLSAITADGLANIDSLKIDGAQTKAGVCSTVASSSSSSVASSIKSSSSSSSS
SSTTTVKTLTLDGNPAANWFNKSRTKWNTSRADVVLSYQQSNGGWPKNLDYNSVSAGNGGSDSGTIDNGATITEMVYLAE
VYKNGNNTKYRDAVRRAANFIVSSQYSTGALPQFYPLKGGYADHATFNDNGMAYALTVLDFAVNKRAPFDTDVFSDSDRA
KFKTAVAKGVDYILKAQWKQNGKLTVWCAQHGATDYQPKKARAYELESLSGSESVGVLAFLMTQPQTAQIEAAVKAGVAW
FNSPNTYLNNYTYDSSKASTNPIVAKSGSKMWYRFYDLNTNRGFFSDRDGSKFYDITQMSEERRTGYSWGGDYGTSIISF
AQKVGYL
>ACY24852.1 Pel10A pectate lyase [uncultured organism]
MDGIATENTNAGYTGNGYTNSNNVQGSAIEWAVNAPNSSRYTLTFRFANGGTANRNGSLLINGGSNGNYTMQLPATGGWT
TWQTTSIEIDLVQGNNLLKLSSLTTDGLANIDSLKIEGAQTKAGICSGIASSSASSIKSSSSSSNSSASNTGTLLTLDGN
PAASWLNKSKNKWGTDKADTVLSYQQTNGGWPKNLDYNSVGAGSGGSESGTIDNGATITEMVYLAEIYKNGKNTKYRDAV
RKAANFLVSSQYSTGALPQFYPLKGGYADHATFNDNGMAYALTVLDFAANNRAPFDTDVFSDTDRNKFKTAVTKGTAYIL
KAQWKQNGRLTVWCAQHGATDYLPKKARAYELESLSGSESVGILAFLMTQPQTAEIEQAIRAGVAWFNSPNTYLDGYTYD
SAQATTNPIVKKSGSKMWYRFYDLNTNRGFFSDRDGSKFYDITQMSEERRTGYSWGGAYGNSIIPFAQKVGYL
>KAJ3038074.1 hypothetical protein HDV00_001034 [Rhizophlyctis rosea]
MVKLLALGCALLLGAVSVNAQVPVYGQCGGQGYTGSTVCASGSVCTFSNDWYSQCLPGTASTTVKTTTTTKAVTTTKAAT
TSTVKATTTTKASTSTTTSSGSIATILPQSGNPMVNWFTKARTKWSTSLANRILSYQQSHGGWPKNIDYASVANGSGGSE
LGTFDNGATNTEMIFLAEQYKSGGNTKFRDAVRKGASYILSAQYSTGGWPQFYPLKGGYADYVTFNDDAMAHTLTLLNSA
VNKVAPFDTDIFTDADRTKFKTAIDKGVAYILKAQYKQNGVLTVWCAQHDKDTYAPKPARAYELESLSGLESVGILSFLM
TQPQTSAISTAVKAGLAWYRSPKTYLDGYTYVSGQNEPIVAKAGSKMWYRFYDLTTNRGFFSDRDGGKYYDIMQISEERR
TGYQWAGSYGDTIGKYASSVGL

Please read about SCHEMA in order to determine what criteria to use to select parents. To briefly summarize, (1) ensure all parents are of similar lengths and (2) ensure there is a lot of sequence identity within the parents.

After providing the parents, you must tell SCHEMA which of those sequences it should align with the sequence of the structure. This is done by selecting an option from the "Select Structure Sequence" combobox which should have been updated with the name of all of the parents in the fasta file provided. We use "1GXM_A Family 10 ..." as the sequence to align with the structure:

Step 3

Next step is to provide the list of assembly points (comma separated) that will be evaluated by schema. These must be entered in the "Shuffling Points" field. As an example, one can provide:

326,345,478,572

Note: The positions are relative to the structure's sequence. This means that you should use the "Display -> Sequence" in Pymol to know what values you wish to select. Bear in mind that the SCHEMA algorithm uses the positions in the parent's MSA file so the positions you provide are translated into the positions of the MSA. There is no 1 to 1 correspondence between the positions of the MSA and the positions in the Structure, so not all inputs are possible with this tool.

Step 4

Finally, you must choose what the "energy scoring" and "substitution scoring" you wish to use. If you wish to run SCHEMA as originally published, select "SCHEMA classic" for both options. Below is a brief overview of the options:

Energy Scoring

This determines how to assign the disruption value for two residues that are in contact:

SCHEMA classic: Simply use the value "1" if two residues are in contact.
Simplified Physics: A crude approximation of Van der Waal's which counts the atoms that are close within two residues and uses said count as a score.

Substitution Scoring

This determines the penalty that will be paid for a particular substitution of residues.

SCHEMA classic: Simply uses the value "1" if the residue is different from the substitution
BLOSSUM 80 & 62: Use the BLOSUM matrix to assign a value depending of the residues being substituted

Step 5

Now simply hit the "Run SCHEMA Energy" button and wait for the procedure to complete. Here is a final screenshot of the configuration: