EMX upload format - erwinwinder/molgenis GitHub Wiki
Introduction
EMX (entity model extensible) is a flexible spreadsheet format to easily upload any tabular data using Excel or a zipfile of tab-delimited *.txt files. This works because you can tell MOLGENIS the 'model' of your data via a special sheet named 'attributes'.
Minimal example (download)
For example, if you want to upload an Excel with sheet 'patients':
| displayName | firstName | lastName | birthdate | birthplace |
|---|---|---|---|---|
| john doe | john | doe | 1976-13-03 | new york |
| jane doe | jane | doe | metropolis | |
| papa doe | papa | doe | new york |
Then you must provide a model of your 'patients' via Excel with sheet named 'attributes':
| name | entity | dataType | idAttribute | nillable | description |
|---|---|---|---|---|---|
| displayName | patients | TRUE | FALSE | name | |
| firstName | patients | first name | |||
| lastName | patients | family name | |||
| birthdate | patients | date | day of birth | ||
| birthplace | patients | place of birth |
'entity' should show the name of your data sheet. Each attribute the column headers in your data. Default dataType is 'string' so you only need to provide non-string values (int, date, decimal, etc). And you must always provide one idAttribute that has 'nillable' = 'FALSE'.
You can first upload the 'model' and then the 'data'. Or you can put the both into one file and upload in one go. What you prefer :-) [todo: provide example files for download]
Advanced data example (download)
Lets assume we want to upload multiple data sheets, with relations between them:
Cities:
| cityName |
|---|
| new york |
| metropolis |
Patients:
| displayName | firstName | lastName | birthdate | birthplace | children | disease |
|---|---|---|---|---|---|---|
| john doe | john | doe | 1976-13-03 | new york | none | |
| jane doe | jane | doe | metropolis | none | ||
| pape doe | papa | doe | new york | john doe, jane doe | cardio |
Notes: birthplace refers to elements in the cityName values in the cities table. children contains comma separated values referring to another patient via displayName (trailing spaces will be removed).
Users:
| userName | active | displayName | firstName | lastName |
|---|---|---|---|---|
| jdoe | TRUE | john doe | john | doe |
| jdoe2 | jane doe | jane | doe | |
| pdoe | papa doe | papa | doe |
Note: users looks similar patients, i.e. they are also persons having 'displayName', 'firstName', and 'lastName'. We will use this in the model below.
Advanced model example
To model the data advanced data example, again you need to provide the 'attributes' (i.e., columns, properties). Optionally, you can also describe entities (i.e., classes, tables), and packages (i.e, models and submodels) which gives you some advanced options.
'Attributes' sheet (required)
The example below defines the model for entities 'city', 'patient' and 'user'. Note that 'users' had some attributes shared with 'patients' so we will use 'object orientation' to say that both 'user' and 'patient' are both a special kind of 'persons'. This will be defined using the 'extends' relation defined in the 'entities' sheet below.
| name | entity | dataType | nillable | refEntity | idAttribute | description |
|---|---|---|---|---|---|---|
| cityName | cities | FALSE | TRUE | unique city name | ||
| displayName | persons | FALSE | TRUE | unique name | ||
| firstName | persons | first name | ||||
| lastName | persons | family name | ||||
| birthdate | patients | date | day of birth | |||
| birthplace | patients | xref | cities | place of birth | ||
| disease | patients | disese description | ||||
| userName | users | FALSE | TRUE | unique login name | ||
| active | users | bool | whether user is active |
'Entities' sheet (optional)
In most cases the 'attributes' sheet is all you need. However, in some cases you may want to add more details on the 'entity'. Here we wanted to show use of 'abstract' (i.e., interfaces) to create model class 'persons' and 'extends' (i.e., subclass, inheritance) to define that 'user' and 'patient' have the same attributes as 'persons'. When data model become larger, or when many data sheets are loaded then the 'package' construct enables you to group your (meta)data.
| name | package | extends | abstract | description |
|---|---|---|---|---|
| cities | hospital | list of cities | ||
| persons | hospital | true | person defines general attributes like firstName, lastName | |
| users | hospital | persons | users extends persons, meaning it 'inherits' attribute definition | |
| patients | hospital | persons | patient extends person, adding patientNumber |
'Packages' sheet (optional)
Package allow you to create several models in your system, describe the and next them using the 'parent' relationship. For example:
| name | description | parent |
|---|---|---|
| root | my main package | |
| hospital | sub package holding entities to describe all kinds of persons | root |
EMX model reference documentation
Minimally, you need to provide the 'attributes' of your model (i.e., columns, properties). Optionally, you can also add metadata on entities (i.e., classes, tables), and packages (i.e, models and submodels)
'Attributes'
Required columns:
- entity : name of the entity this attribute is part of
- attribute : name of attribute, unique per entity
Optional columns (can be omitted):
- dataType: defines the data type (default: string)
- string : character string of <255 characters
- text : character string of unlimited length (usually <2Gb)
- int : natural numbers like -1, 0, 3. Optionally use rangeMin and rangeMax
- long : non-decimal number of type long
- decimal : decimal numbers like -1.3, 0.5, 3.75 (float precision)
- bool : yes/no choice
- date : date in yyyy-mm-dd format
- datetime : date in yyyy-mm-dd hh:mm:ss
- xref : cross reference to another entity; requires refEntity to be provided
- mref : many-to-many relation to another entity; requires refEntity to be provided
- compound : way to assemble complex entities from building blocks (will be shown as tree in user interface); requires refEntity to be provided
- refEntity : used in combination with xref, mref or compound. Should refer to an entity.
- nillable : whether the column may be left empty. Default: false
- idAttribute : whether this field is the unique key for the entity. Default: false. Use 'AUTO' for auto generated (string) identifiers.
- description : free text documentation describing the attribute
- rangeMin : used to set range in case of int attributes
- rangeMax : used to set range in case of int attributes
- lookupAttribute : true/false to indicate that the attribute should appear in the xref/mref search dropdown in the dataexplorer
- label : optional human readable name of the attribute
- aggregateable : true/false to indicate if the user can use this atrribute in an aggregate query
- labelAttribute : true/false to indicate that the value of this attribute should be used as label for the entity (in the dataexplorer when used in xref/mref)
- readOnly true/false to indicate a readOnly attribute
- tags : ability to tag the data referring to the tags sections, described below
Wish list
'Entities' options
Required columns:
- entity : unique name of the entity. If packages are provided, name must be unique within a package.
Optional columns:
- extends : reference to another entity that is extended
- package : name of the group this entity is part of
- abstract : indicate if data can be provided for this entity (abstract entities are only used for data modeling purposes but cannot accept data)
- description : free text description of the entity
- tags : ability to tag the data referring to the tags sections, described below
- validationExpression : javascript validation expression that must return a bool. Must return true if valid and false if invalid. See for a syntax description the section Javascript Expressions
'Packages' Options
Required columns:
- name : unique name of the package. If parent package is provided the name is unique within the parent.
Optional columns:
- description : free text description of the package
- parent : use when packages is a sub-package of another package
- tags : mechanism to add flexible meta data such as ontology references, hyperlinks
BETA feature: 'tags'
'Tags' sheet (optional, BETA)
Optionally, additional information can be provided beyond the standard meta data described above. Therefore all meta-data elements can be tagged in simple or advanced ways (equivalent to using RDF triples). For example, above in the packages example there is a 'homepage' tag provided. For example:
| identifier | label | objectIRI | relationLabel | codeSystem | relationIRI |
|---|---|---|---|---|---|
| like | like | ||||
| homepage | http://www.molgenis.org | http://www.molgenis.org | homepage | ||
| docs | http://some.url | http://www.molgenis.org | Documentation and Help | EDAM | http://edamontology.org/topic_3061 |
'Tags' options
Required columns:
- identifier : unique name of this tag, such that it can be referenced
- label : the human readable label of the tag (e.g. the 'like' tag as shown above).
Optional columns:
- objectIRI: url to the value object (will become an hyperlink in the user interface)
- relationLabel: human readible label of the relation, e.g. 'Documentation and Help'
- relationIRI: url to the relation definition, e.g. http://edamontology.org/topic_3061
- codeSystem: name of the code system used, e.g. EDAM
Change or change documentation:
- 'required' and 'unique' (and xref_entity?) property for attribute?
- create separate 'unit' list?
- can we hload dataset without entity / attributes (auto load?)
- create 'category'
- code, label, ismissing, description, ontology
- use decorator to automatically produce identifier (optional)
- validation rules