Protein connection graph - PathwayAnalysisPlatform/PathwayMatcher GitHub Wiki
PathwayMatcher allows the user to generate a connection graph as an additional output when executing the pathway search and analysis. The graph can use genes, proteins or proteoforms as vertices, with the command line arguments -gg, -gu and -gp respectively.
The connection graph is defined by a set of vertices and edges, where vertices represent genes, proteins or proteoforms.
The edges represent connections/relations between proteins according to the data model in the Reactome database.
Proteins are referenced only by their UniProt[1] accession. Genes follow the HUGO gene nomenclature[2]. The proteoforms follow the Simple format explained here.
There is an connection between two proteins when:
- (Protein1)--(Complex)--(Protein2): Both are components of the same complex.
- (Protein1)--(Reaction)--(Protein2): Both participate in the same reaction.
- (Protein1)--(Set)--(Protein2): Both are members of the same entity set.
This connections are undirected, they have no direction; the two proteins are just related to each other.
Proteins can participate with multiple roles in a chemical reaction:
- input (reactant)
- output (product)
- catalyst
- regulator
Proteins participate independently or as components of a complex or entity set:
- (Reaction)--(Protein)
- (Reaction)--(Complex)--(Protein)
- (Reaction)--(Complex)--(Complex)--(Protein)
- (Reaction)--(Set)--(Protein)
- (Reaction)--(Set)--(Set)--(Protein)
- (Reaction)--(Complex)--(Set)--(Protein)
- (Reaction)--(Complex)--(Set)--(Set)--(Complex)--(Protein)
- ...
For the genes and proteoforms, the connections function in a similar way, replacing the protein by the respective gene or proteoform.
Finally, there are two types of edges: internal and external.
- Internal edges are connections between proteins of the input list.
- External edges are connections between a protein in the input list and a protein not in the input list.
The graph is defined in three files vertices.tsv, internalEdges.tsv and externalEdges.tsv. The format chosen to represent these graphs is compatible with the iGraph System notation [3] for graphs. By default, they are saved in the same directory where PathwayMatcher is located. To save them in a different directory use the command line argument -o.
A tab separated file (.tsv) with two columns, one vertex (protein) each row:
- id: Uniprot accession of the protein
- name: Colloquial name of the protein
Example:
id name
P35070 Probetacellulin
P21359 Neurofibromin
Q8IV61 Ras guanyl-releasing protein 3
Tab separated files (.tsv) with 6 columns, one edge (connection) each row:
- id1: UniProt accession of one protein in the connection
- id2: UniProt accession of the second protein in the connection
- type: Where the two proteins meet (Complex or Reaction)
- container_id: Id of the complex or reaction
- role1: Role of the first protein in the connection
- role2: Role of the second protein in the connection
Example:
id1 id2 type container_id role1 role2
P27361 P28482 Reaction R-HSA-5675373 input output
P27361 P28562 Reaction R-HSA-5675373 input catalyst
P27361 P28562 Reaction R-HSA-5675373 output catalyst
O43524 P84022 Complex R-HSA-1535906 component component
The interaction networks produced by PathwayMatcher can be easily processed using Cytoscape[4] to perform further analysis. Here we provide suggestions of useful tutorials for this purpose:
- Load the network
- Selecting hub nodes
- Selecting subnetworks
- Creating subnetworks
- Applying layouts
- Setting styles
- Saving results
The networks can be also processed programmatically, e.g. using igraph. Examples of handling and plotting in the R programming language are available in the scripts we used to generate the figures of the publication here.
[1] UniProt: the universal protein knowledgebase. Nucleic Acids Res. 45: D158-D169 (2017)
[2] HUGO gene nomenclature
[3] Ferres L., Parush A., Li Z., Oppacher Y., Lindgaard G. (2006) Representing and Querying Line Graphs in Natural Language: The iGraph System. In: Butz A., Fisher B., Krüger A., Olivier P. (eds) Smart Graphics. SG 2006. Lecture Notes in Computer Science, vol 4073. Springer, Berlin, Heidelberg
[4] Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Research 2003 Nov; 13(11):2498-504