Data principles - aus-plant-phenomics-network/appn-implementation GitHub Wiki

FAIR

The FAIR (Findable, Accessible, Interoperable, Reusable) Principles derive from a 2016 paper by Wilkinson et al.: The FAIR Guiding Principles for scientific data management and stewardship. These principles are now treated internationally as prerequisites for best-practice management of research data.

See: https://ardc.edu.au/resource/fair-data/

It is important to note that the FAIR Principles do not require open access to data, although this is highly desirable whenever practicable. Making all datasets FAIR ensures that any users or software with permission to access the data are able to understand the characteristics of the data and apply them intelligently both now and into the future.

Data engineering

APPN is committed to maximising the FAIRness of data by adopting and adapting data engineering principles at all stages in plant phenotyping.

See WorldFAIR (D1.4) Second Policy Brief, pp. 7-9:

There is an urgent need for a shift from a ‘bibliographic’ data stewardship practice to a data engineering practice! The most fundamental recommendation to emerge from the WorldFAIR project is the following: to support the requirements of 21st Century science, we need to enable a transformation in our practice for data stewardship and move from a bibliographic approach to a data engineering approach.

In the bibliographic model, data is treated like a book in a library: a dataset is deposited in an appropriate domain specialist or generalist repository as a data package, with a persistent identifier and discovery metadata in an extended form of Dublin Core. This is, of course, better than nothing: the repository and the data stewards involved have performed an important service in ensuring that the data was not (to all intents and purposes) lost on a research group server or a personal hard drive. For such data to be reused, however, the dataset must be downloaded, and the significant task of data wrangling remains, often with inadequate, non-standard or only implicit information about the data and semantics. This is precisely the issue highlighted in the PWC report on the opportunity costs of not having FAIR data53, and it falls well short of the EOSC and FAIR vision of machine-actionable data. If we persist with the bibliographic model, we will not achieve the ‘web of FAIR data and services’ promised by the EOSC.

Open-by-default

APPN aims to deliver nationally significant data collections to support crop research and plant science and to enable plant trait data to be integrated into transdisciplinary models for areas such as sustainable food production, landscape management.

Data collected by APPN facilities may require limitations on access, whether owing to commercial considerations, pending research publication or other sensivities. When feasible, such restrictions should be handled as time-limited embargoes, but some data may never be widely shared.

Despite these restrictions, APPN will seek to apply the FAIR principles and data engineering principles to every dataset to maximise the value to all users with relevant permissions.

Unified graph

APPN aims to support rich interoperability and integration across all data from all APPN studies and with plant phenotyping data collected elsewhere and data from other earth science and life science domains that overlap with crop science.

This is achieved by a set of consistent approaches to deliver linked data.