Introduction to GOATOOLS package - ricket-sjtu/bioinformatics GitHub Wiki
One fundamental task for gene ontology is to retrieve the ontology and to analyze the hierarchical structure. GOATOOLS is a python package for conducting such jobs.
goatools
1. Install pip install goatools
2. Load the OBO file
from goatools.obo_parser import GODag
g = GODag("go-basic.obo")
The object g
is a dictionary with key the GO term ID, and the value the GOTerm
object containing the following
attributes:
GOTerm.id
: string, the identifier of the ontology term.GOTerm.name
: string, the description of the ontology term.GOTerm.namespace
: string, BP, CC, or MF.GOTerm.parents
: a list containing all the parents (each parent is a GOTerm object)GOTerm.children
: a list containing all the children (each child is a GOTerm object).GOTerm.level
: int, the shortest distance from root node.GOTerm.depth
: int, the longest distance from root node.GOTerm.is_obsolete
: Bool, True or FalseGOTerm.alt_ids
: a list containing the alternative identifiers.
All these attributes can be accessed using the following approach:
for myterm in g.values():
print(myterm.id, myterm.name)
We can also include the following optional attributes:
optional_attributes = ["def", "defn", "synonym", "relationship", "xref", "subset", "comment"]
For example, we can include one of the optional attributes, relationship
in the GODag()
constructor:
g = GODag("go.obo", optional_attr=["relationship"])
for id in g.keys():
term = g.get(id)
rels = term.relationship.keys()
if "positively_regulates" in rels:
regulated = term.relationship["positively_regulates"]
for r in regulated:
print("{id1} ({name1}) (+)-> {id2} ({name2})".format(id1=id, name1=term.name, id2=r.id, name2=r.name))
Fore more relationship information, please refer to ontology relationships.