Introduction to GOATOOLS package - ricket-sjtu/bioinformatics GitHub Wiki

One fundamental task for gene ontology is to retrieve the ontology and to analyze the hierarchical structure. GOATOOLS is a python package for conducting such jobs.

1. Install goatools

pip install goatools

2. Load the OBO file

from goatools.obo_parser import GODag
g = GODag("go-basic.obo")

The object g is a dictionary with key the GO term ID, and the value the GOTerm object containing the following attributes:

  • GOTerm.id: string, the identifier of the ontology term.
  • GOTerm.name: string, the description of the ontology term.
  • GOTerm.namespace: string, BP, CC, or MF.
  • GOTerm.parents: a list containing all the parents (each parent is a GOTerm object)
  • GOTerm.children: a list containing all the children (each child is a GOTerm object).
  • GOTerm.level: int, the shortest distance from root node.
  • GOTerm.depth: int, the longest distance from root node.
  • GOTerm.is_obsolete: Bool, True or False
  • GOTerm.alt_ids: a list containing the alternative identifiers.

All these attributes can be accessed using the following approach:

for myterm in g.values():
  print(myterm.id, myterm.name)

We can also include the following optional attributes:

optional_attributes = ["def", "defn", "synonym", "relationship", "xref", "subset", "comment"]

For example, we can include one of the optional attributes, relationship in the GODag() constructor:

g = GODag("go.obo", optional_attr=["relationship"])
for id in g.keys():
  term = g.get(id)
  rels = term.relationship.keys()
  if "positively_regulates" in rels:
    regulated = term.relationship["positively_regulates"]
    for r in regulated:
      print("{id1} ({name1}) (+)-> {id2} ({name2})".format(id1=id, name1=term.name, id2=r.id, name2=r.name))

Fore more relationship information, please refer to ontology relationships.