Aligning BioNLP’13 GE event types and elements with the ontologies - linkedannotation/blah2015 GitHub Wiki

Terminology

Event types: The event types defined in BioNLP’13, e.g. Gene Expression, Positive Regulation etc. Event element: The arguments of an event, e.g. Theme, Cause.

Background

The main goal of the hackathon is also to link the different types of resources. Furthermore, the ultimate goal of bio-text-mining is to automatically retrieve information and present it in a format understandable by biologists.

The event types in BioNLP tasks were originally defined according to Gene Ontology. However, the elements of events were defined more in accordance with linguistic thematic roles. As time goes on, more types of events and elements are available. Meanwhile, event types might have moved away from the original definitions. We want to know whether the current dataset of bio-relation extraction could be converted to a format in better accordance with biological ontologies. Thereby, the data-driven approach of relation extraction could extract bio-relations and normalize them with the ontologies.

The alignment between the event types and the elements with the ontologies will

  • Link the corpora with the ontologies;
  • Reveal the capacity of information granularity of current format;
  • Compare the internal hierarchy of the current format with the ontology to link linguistic thematic roles with semantic roles;
  • Identify the misalignments between two logics;
  • Use existing corpora of relation extraction for knowledge retrieval with higher information granularity

This information may assist

  • extraction of information with higher granularities
  • event normalization
  • integrate curated knowledge in biological semantic resources with NLP-based knowledge retrieval.

Ontologies

Selection of the ontologies starts with those, which are general and systematic. Now I start with Gene Ontology (GO) and Systems Biology Ontology (SBO).

The principles of aligning event thematic roles with ontologies

The alignment is between the current types and elements defined in the BioNLP’13 GE task (http://bionlp.dbcls.jp/projects/bionlp-st-ge-2013/wiki/IEevaluation) without the assistance of specific information from the corpus Based on the information in the definition, find the ontological class as specific as possible. Therefore, if an event could be linked with an ontological class in Molecular Function.

The guidelines of using the alignment to convert current annotations

Alignments

BioModels.net qualifiers are used for describing the relations between event

Gene expression

As Gene Expression is a series events rather than a single activity at molecular level, it is annotated with Gene expression (GO:0010467) under Biological Process.

Transcription

As Transcription is a series events rather than a single activity at molecular level, it is annotated with transcription, DNA-templated (GO:0006351) under Biological Process.

Protein Catabolism

Protein Catabolism is also a series events instead of a single activity at molecular level. It is annotated with protein catabolic process (GO:0030163) under Biological Process.

Binding

In Gene Ontology, Binding could be either