Domain model - aus-plant-phenomics-network/appn-implementation GitHub Wiki
Principles
APPN aims to ensure that data collected in any phenotyping study at any of its nodes can be correctly interpreted and reused by any human user or software application.
In practice, data collected by each node may reflect the structures and standards used by different equipment providers or the outputs from different software pipelines that may need to remain unchanged. This results in data heterogeneity that APPN must accommodate in its data packaging and publication processes.
APPN will achieve this by assisting all nodes in mapping their data into a consistent yet highly flexible format. This format requires standardisation in two areas: concepts (i.e. semantics) and representation (i.e. syntax).
For the first, APPN will adopt a conceptual framework for plant phenotyping data that will enable all nodes and all users of APPN data to describe and communicate information about any study using a common language and shared understanding of the elements we need to describe and discuss. This conceptual framework is a domain model, a generalised conceptualisation of the structure of a plant phenotyping study. This section outlines APPN’s planned domain model and how it is expected to evolve as technologies and needs change. The APPN Central Data Team will develop and support tools that enable each node to describe their data in terms of this domain mode.
For the second, APPN will adopt international best practices for gathering domain-compliant data and digital assets from each study, representing each part with human- and machine-readable metadata, and delivering the results as a FAIR data package. Benefits from this will include the ability for any APPN node or user to open any APPN data package and understand and correctly handle all the data that were collected as part of a study, and a significantly increased possibility that international users and users from other research domains will also be able to understand these data for their own use. See: Data and Metadata Standards.
See Data classes for a list of data classes used by APPN.
MIAPPE
The Minimum Information About a Plant Phenotyping Experiment (MIAPPE) model is the product of collaborative standards development from multiple groups around the world working with plant phenotyping data. It is described as follows:
“MIAPPE is a Minimum Information (MI) standard for plant phenotyping. It defines a list of attributes that might be necessary to fully describe a phenotyping experiment, following the model originally established for microarray data. Not all of the elements listed in an MI must be reported in each case. An MI document should rather be considered as a checklist and consulted by a person describing or depositing the data to ensure the inclusion of all important data characteristics, i.e. what is meaningful for the interpretation and potential replication of the research.”
Figure 1: Partial overview of MIAPPE classes and relationships
As with other Minimum Information models, MIAPPE does not provide a complete and authoritative structure for handling all data relevant to its domain. Rather, it maps out the main concepts that need to be agreed by practitioners to ensure that the most significant aspects of any plant phenotyping experiment can be communicated in a way that other practitioners will be able to understand.
The heart of MIAPPE is a set of classes (represented by blue circles in Fig. 1 in Papoutsoglou et al. 2020, Enabling reusability of plant phenomic datasets with MIAPPE 1.1). Each class represents a set of identifiable objects or concepts that share a common definition. Practitioners should readily recognise examples of many of these classes in the data they collect. They should also be able to recognise when two or more examples should be considered to be references to the “same” thing. Each distinct instance of each of these classes is something which we can describe and to which we can attach an identifier as a label so we can refer to it and retrieve the appropriate description. Instances of these classes can be linked using standard relationships represented by MIAPPE properties. Indeed, relationship properties are an important part of the metadata that describes each instance. For example, the Observation Units for a Study are normally plants each of which has a genotype identified as its Biological Material.
Other ontologies
MIAPPE (and PPEO) lack several classes that are necessary for documenting a plant phenotyping study. The following ontologies will be used to provide these classes in forms that are expected to be widely reusable. In many cases, multiple ontologies offer different classes for the same or closely similar concepts. In such cases, APPN will seek to model metadata to assert multiple class relationships via derived classes.
RO-Crate
RO-Crate is a general packaging model for data assets. It relies on JSON-LD metadata to document and describe these assets, with schema.org classes as the core of its class model.
APPN is adopted RO-Crate as its metadata serialisation format and will build on the schema.org class hierarchy to represent the data classes required to represent MIAPPE and other elements not properly addressed in MIAPPE.
Documents
The following specification documents are relevant and will guide implementation.
- Linked Data Patterns - design patterns for linked data
- MIAPPE Checklist Data Model - Excel overview of MIAPPE classes and properties
- Plant Phenotyping Experiment Ontology (PPEO) - OWL representation
- PROV-O: The PROV Ontology - W3C Recommendation
- Semantic Sensor Network Ontology - W3C Editor's Draft - includes draft SOSA-PROV alignments