Custom model analysis - Gardner-BinfLab/deltaBS GitHub Wiki

Basic concept

For custom model analysis, first we need to build custom HMMs for all proteins possible in our query proteome. Some won't be represented in the Uniref90 database, and for our analyses we exclude these proteins from custom analysis, however you could simply build a custom HMM based off the query protein sequence in these cases, if you prefer. This choice would depend on the context of your question - whether it is OK for results to be biased towards giving your query proteome functional calls and your comparator a loss-of-function call.

For each protein in the query proteome, we search it against Uniref90 to find possible homologs. These hits are then filtered by sequence identity to exclude those that are too dissimilar. HMMs are built off the aligned hits to each protein.

Both proteomes are then searched against these custom HMM databases using hmmsearch. Scores for matches of proteins to the appropriate model are then extracted. Any ortholog pairs that have not been scored by custom models are then searched against domain models in Pfam in a classic Pfam-based workflow.

Testing

Run:

  ./buildCustomModels.pl -d ../data -v -t 

to test the custom model building script on some sequences retrieved form Uniref90

Note

This custom model building workflow is slightly different to that in our methods manuscript - instead of creating a file of filtered sequences after sequence ID filtering and re-aligning them using mafft, sequences are taken directly from the jackhmmer alignment and printed to a new, gap-free alignment, making the workflow faster