Introduction to GOATOOLS package - ricket-sjtu/bioinformatics GitHub Wiki
One fundamental task for gene ontology is to retrieve the ontology and to analyze the hierarchical structure. GOATOOLS is a python package for conducting such jobs.
1. Install goatools
pip install goatools
2. Load the OBO file
from goatools.obo_parser import GODag
g = GODag("go-basic.obo")
The object g is a dictionary with key the GO term ID, and the value the GOTerm object containing the following
attributes:
GOTerm.id: string, the identifier of the ontology term.GOTerm.name: string, the description of the ontology term.GOTerm.namespace: string, BP, CC, or MF.GOTerm.parents: a list containing all the parents (each parent is a GOTerm object)GOTerm.children: a list containing all the children (each child is a GOTerm object).GOTerm.level: int, the shortest distance from root node.GOTerm.depth: int, the longest distance from root node.GOTerm.is_obsolete: Bool, True or FalseGOTerm.alt_ids: a list containing the alternative identifiers.
All these attributes can be accessed using the following approach:
for myterm in g.values():
print(myterm.id, myterm.name)
We can also include the following optional attributes:
optional_attributes = ["def", "defn", "synonym", "relationship", "xref", "subset", "comment"]
For example, we can include one of the optional attributes, relationship in the GODag() constructor:
g = GODag("go.obo", optional_attr=["relationship"])
for id in g.keys():
term = g.get(id)
rels = term.relationship.keys()
if "positively_regulates" in rels:
regulated = term.relationship["positively_regulates"]
for r in regulated:
print("{id1} ({name1}) (+)-> {id2} ({name2})".format(id1=id, name1=term.name, id2=r.id, name2=r.name))
Fore more relationship information, please refer to ontology relationships.