Cascade BLAST - TUM-CBR/pymol-plugins GitHub Wiki
Cascade BLAST
This tool helps identify organisms that use a different enzyme for one (or more) steps of an enzymatic cascade. Below we describe how this tool is used with an example.
Defining the Cascade
The first step is to define the enzymatic cascade that will be considered. For this article, we will construct an example based on methanogenesis as described in Metabolism of Methanogens. We will use the "Methanosarcina mazei (taxid:2209)" as our methanogen. To that end, we prepare a fasta file with the residue sequence of the enzymes involved in methanogeneis for that organism:
>Formyl_Transferase
MTVKIAVLVSGRGSNLQAIIDSIEKGYIKNAAVNVVISNKADAYALERAKNHGISAVFLDSRGRDRAEYDREILKVLRQYDTDLLLLAGYFRLLGSEIINAYRNRILNIHPSLLPAFKGLHAQKQAFEYGVKVAGCTVHFVDEGLDSGPIIIQSCVPVLTGDTEETLTDRILEQEHIIYPEAVRLFVEGKLKVEGRNVTAPDVL
>Formylmethanofuran_Dehydrogenase
IEVGVENLDFNDAKFDTAVKFHGHVCPGISIGYRVAMLAAERFKDRSEDEELVAVVENRSCAVDAIQAINGCTCGKGNLIFKEHGKHVYTFYKRGSEKALRISLKPDALPQDSRHTALFAKLRAGTASPEEEKEFQASHDAKSQRILEMPEEELFRVSEVNIEPPEKAIIYPTVICSKCGEGFMEPLGRVKNGEIVCISC
>Methenyl-H4MPT_Cyclohydrolase
LISVNEMGSYVIEEMLDWSEDLKTEVIKLENGATIIDCGIKAEGGYEAGMYLARLCLADLADLKYSTFDLNGIKWPAIQVATDNPVIACMASQYAGWRISVGNYFGMGSGPARALGLKPKELYEEIGYEDDFEAAVLVMESDKLPDEKVVEYIAKHCSVDPENVMIAVAPTASIAGSVQISARVVETGIHKFESIGFDINCIKSGYGIAPIAPIVGNDVQCMGSTNDCVIYCGETNYTVSFEGELAKLEDFVRKVPSTTSDDFGKPFYQTFKAANFDFFKVDAGMFAPARVTVNDLKNKKTISSGGLYPEILLESFGIR
>F420-dependent_Hydrogenase
KIKLGHVHLSGCTGCLVSTADNNLXXXXFIKILDNYADLVYSLTLADVRHVPEMDVALVEGSVCIQDHESVEEIRETREKAKIVVALGSCACYGNITRFSRGGQHNQPQHESFLPIGDLIDVDVYIPGCPPSPELIRNVAVMAYLLLEGD
We then proceed to open that fasta file int the "Select File" button of the application:
Once the file has been loaded, a table will be displayed with all of the enzymes as the first row and all of the steps as columns. There are combo boxes that allow one to tick which enzymes will be considered for each step. It is possible to have multiple enzymes on a single step in case one wishes to identify organisms that don't use any of the enzymes provided. Nevertheless, for our current example, each enzyme is only involved in a single step, so we tick accordingly. Furthermore, it is possible to double click the cell that contains the step's name to change the name. The name of the step doesn't influence the behavior of the application, but it will be helpful when looking at the results. After doing all this, we end up with a table like:
Finally, we must now configure the remaining parameters:
- For the domain, we will select "Archea (taxId:2157)". This option determines in which domain will organisms be searched.
- The number of steps will be 4 for this example. You can increase/decrease this value for longer/shorter cascades.
- The "minimum identity target" defines the stopping condition of the search. This means the search will continue until we find organisms that have genes with an identity below that value in their genome for all of the enzymes in our cascade.
- You must enter an "email". This is provided to NCBI when consulting databases. You will be notified in case of excessive usage of resources. This program should throttle its requests in such a way that it does not exceed the usage quota established by NCBI. Below is how these parameters look like:
After hitting the "Create" button, a new window will open showing a progress monitor. This is the indication that a search is taking place. You can look at some details if you like (very technical) by hitting the "Details" button. It will display some information about the searchers being carried out. You can double-click any cell to see the full value. In particular, clicking the cells as the one highlighted below:
Will show you the state of the request being sent to NCBI:
The "Open Copy" button can be used to view the current state of the search if one wishes to view it in an incomplete fashion.
Viewing Results
After the search completes, using the "Option 1" of the main application or using the "Open Copy" button of the searching interface will bring you to the results inspector. A sample results set from the previous section is available here Methanogenesis, which you can use to immediately open the viewer. This looks like:
This view provides controls on the first row to filter out organisms by three criteria: "any", "less" and "greater". Suppose we wanted to find organisms that lack the gene that codes the "420-Dependent Hydrogenase", we can set the filter to only include organisms that have an identity for that enzyme less than 60%:
We can then hit the "Filter" button to get the filtered results:
We can further refine our search. Suppose we want organisms that do contain the gene for "Formyl Transferase". We can then ask the program to only include organisms which have an identity above 85% for that enzyme as shown below:
We once again hit the "Filter" button to obtain new results:
Finally, we can save these results for later consultation in order to avoid running the slow process of constructing the result set. Use the "Save Results" button to save a copy. This copy can later be opened using the "Select File" button: