EMX upload format - erwinwinder/molgenis GitHub Wiki

Introduction

EMX (entity model extensible) is a flexible spreadsheet format to easily upload any tabular data using Excel or a zipfile of tab-delimited *.txt files. This works because you can tell MOLGENIS the 'model' of your data via a special sheet named 'attributes'.

Minimal example (download)

For example, if you want to upload an Excel with sheet 'patients':

displayName firstName lastName birthdate birthplace
john doe john doe 1976-13-03 new york
jane doe jane doe metropolis
papa doe papa doe new york

Then you must provide a model of your 'patients' via Excel with sheet named 'attributes':

name entity dataType idAttribute nillable description
displayName patients TRUE FALSE name
firstName patients first name
lastName patients family name
birthdate patients date day of birth
birthplace patients place of birth

'entity' should show the name of your data sheet. Each attribute the column headers in your data. Default dataType is 'string' so you only need to provide non-string values (int, date, decimal, etc). And you must always provide one idAttribute that has 'nillable' = 'FALSE'.

You can first upload the 'model' and then the 'data'. Or you can put the both into one file and upload in one go. What you prefer :-) [todo: provide example files for download]

Advanced data example (download)

Lets assume we want to upload multiple data sheets, with relations between them:

Cities:

cityName
new york
metropolis

Patients:

displayName firstName lastName birthdate birthplace children disease
john doe john doe 1976-13-03 new york none
jane doe jane doe metropolis none
pape doe papa doe new york john doe, jane doe cardio

Notes: birthplace refers to elements in the cityName values in the cities table. children contains comma separated values referring to another patient via displayName (trailing spaces will be removed).

Users:

userName active displayName firstName lastName
jdoe TRUE john doe john doe
jdoe2 jane doe jane doe
pdoe papa doe papa doe

Note: users looks similar patients, i.e. they are also persons having 'displayName', 'firstName', and 'lastName'. We will use this in the model below.

Advanced model example

To model the data advanced data example, again you need to provide the 'attributes' (i.e., columns, properties). Optionally, you can also describe entities (i.e., classes, tables), and packages (i.e, models and submodels) which gives you some advanced options.

'Attributes' sheet (required)

The example below defines the model for entities 'city', 'patient' and 'user'. Note that 'users' had some attributes shared with 'patients' so we will use 'object orientation' to say that both 'user' and 'patient' are both a special kind of 'persons'. This will be defined using the 'extends' relation defined in the 'entities' sheet below.

name entity dataType nillable refEntity idAttribute description
cityName cities FALSE TRUE unique city name
displayName persons FALSE TRUE unique name
firstName persons first name
lastName persons family name
birthdate patients date day of birth
birthplace patients xref cities place of birth
disease patients disese description
userName users FALSE TRUE unique login name
active users bool whether user is active

'Entities' sheet (optional)

In most cases the 'attributes' sheet is all you need. However, in some cases you may want to add more details on the 'entity'. Here we wanted to show use of 'abstract' (i.e., interfaces) to create model class 'persons' and 'extends' (i.e., subclass, inheritance) to define that 'user' and 'patient' have the same attributes as 'persons'. When data model become larger, or when many data sheets are loaded then the 'package' construct enables you to group your (meta)data.

name package extends abstract description
cities hospital list of cities
persons hospital true person defines general attributes like firstName, lastName
users hospital persons users extends persons, meaning it 'inherits' attribute definition
patients hospital persons patient extends person, adding patientNumber

'Packages' sheet (optional)

Package allow you to create several models in your system, describe the and next them using the 'parent' relationship. For example:

name description parent
root my main package
hospital sub package holding entities to describe all kinds of persons root

EMX model reference documentation

Minimally, you need to provide the 'attributes' of your model (i.e., columns, properties). Optionally, you can also add metadata on entities (i.e., classes, tables), and packages (i.e, models and submodels)

'Attributes'

Required columns:

  • entity : name of the entity this attribute is part of
  • attribute : name of attribute, unique per entity

Optional columns (can be omitted):

  • dataType: defines the data type (default: string)
    • string : character string of <255 characters
    • text : character string of unlimited length (usually <2Gb)
    • int : natural numbers like -1, 0, 3. Optionally use rangeMin and rangeMax
    • long : non-decimal number of type long
    • decimal : decimal numbers like -1.3, 0.5, 3.75 (float precision)
    • bool : yes/no choice
    • date : date in yyyy-mm-dd format
    • datetime : date in yyyy-mm-dd hh:mm:ss
    • xref : cross reference to another entity; requires refEntity to be provided
    • mref : many-to-many relation to another entity; requires refEntity to be provided
    • compound : way to assemble complex entities from building blocks (will be shown as tree in user interface); requires refEntity to be provided
  • refEntity : used in combination with xref, mref or compound. Should refer to an entity.
  • nillable : whether the column may be left empty. Default: false
  • idAttribute : whether this field is the unique key for the entity. Default: false. Use 'AUTO' for auto generated (string) identifiers.
  • description : free text documentation describing the attribute
  • rangeMin : used to set range in case of int attributes
  • rangeMax : used to set range in case of int attributes
  • lookupAttribute : true/false to indicate that the attribute should appear in the xref/mref search dropdown in the dataexplorer
  • label : optional human readable name of the attribute
  • aggregateable : true/false to indicate if the user can use this atrribute in an aggregate query
  • labelAttribute : true/false to indicate that the value of this attribute should be used as label for the entity (in the dataexplorer when used in xref/mref)
  • readOnly true/false to indicate a readOnly attribute
  • tags : ability to tag the data referring to the tags sections, described below

Wish list

'Entities' options

Required columns:

  • entity : unique name of the entity. If packages are provided, name must be unique within a package.

Optional columns:

  • extends : reference to another entity that is extended
  • package : name of the group this entity is part of
  • abstract : indicate if data can be provided for this entity (abstract entities are only used for data modeling purposes but cannot accept data)
  • description : free text description of the entity
  • tags : ability to tag the data referring to the tags sections, described below
  • validationExpression : javascript validation expression that must return a bool. Must return true if valid and false if invalid. See for a syntax description the section Javascript Expressions

'Packages' Options

Required columns:

  • name : unique name of the package. If parent package is provided the name is unique within the parent.

Optional columns:

  • description : free text description of the package
  • parent : use when packages is a sub-package of another package
  • tags : mechanism to add flexible meta data such as ontology references, hyperlinks

BETA feature: 'tags'

'Tags' sheet (optional, BETA)

Optionally, additional information can be provided beyond the standard meta data described above. Therefore all meta-data elements can be tagged in simple or advanced ways (equivalent to using RDF triples). For example, above in the packages example there is a 'homepage' tag provided. For example:

identifier label objectIRI relationLabel codeSystem relationIRI
like like
homepage http://www.molgenis.org http://www.molgenis.org homepage
docs http://some.url http://www.molgenis.org Documentation and Help EDAM http://edamontology.org/topic_3061

'Tags' options

Required columns:

  • identifier : unique name of this tag, such that it can be referenced
  • label : the human readable label of the tag (e.g. the 'like' tag as shown above).

Optional columns:

  • objectIRI: url to the value object (will become an hyperlink in the user interface)
  • relationLabel: human readible label of the relation, e.g. 'Documentation and Help'
  • relationIRI: url to the relation definition, e.g. http://edamontology.org/topic_3061
  • codeSystem: name of the code system used, e.g. EDAM

Change or change documentation:

  • 'required' and 'unique' (and xref_entity?) property for attribute?
  • create separate 'unit' list?
  • can we hload dataset without entity / attributes (auto load?)
  • create 'category'
    • code, label, ismissing, description, ontology
    • use decorator to automatically produce identifier (optional)
  • validation rules