Template types - epimorphics/dclib GitHub Wiki

Templates are expressed in JSON or YAML syntax. This page uses JSON as the lowest common denominator but YAML has the advantage of allowing comments and multi-line literals.

Templates can be built out of other templates. This provides both reuse and structuring of optional processing.

Template types

Common

All template types allow basic metadata:

{
    "name" : "t1",
    "description" : "I am a template",
    "required" : ["c1", ...],
    "requiredColumns" : ["c1", ...]
}

The required field describes which columns or variable bindings must be present in the data for the template to apply. The field itself is optional. Used in testing template applicability for setting up processing but also in conditional application of template in e.g. hierarchy processing.

The requiredColumns field (also optional) gives column names which must be present in the data for the template to apply. A data set may pass the requiredColumns test but still contain null values in the corresponding columns, whereas requried tests each row to ensure that there is a value present for the that column.

Composite

Provides a top level packaging of a set of templates.

{       
    "type"  : "Composite",
    "bind" : {"var" : "value"},
    "prefixes" : {"prefix" : "uri", ...},
    "onlyIf" : "{pattern}",
    "oneOffs" : [...],   # array or inline template object
    "templates" : [...],   # array or inline template object
    "referenced" : [...],  
    "sources" : [ {source}, ....]
}

The value for each of oneOffs and templates can be a simple template object or an array of templates, each entry in the array can be an inline template or a name (found in referenced or the wider execution environment).

The one-offs templates are run once (each) on the overall data set. They are useful as a way to generate metadata on the overall collection. The environment in which they run will contain a peek at the first row of column values.

The templates are run on each row of data, any templates that don't match are skipped silently. If no template matches that's an error.

The referenced templates are there to resolve template names used in the other templates.

The optional onlyIf guard causes the pattern to be skipped the result of the pattern is anything other than boolean true.

Bindings

Both Composite and Let (parameterized) maps allow you to bind variables which can be used in the subsequent template (and any templates that calls).

In both cases the bindings can either be an object of the form:

{ 
  "variable-name" : "pattern",
  "variable-name2" : "pattern2",
}

Or can be an array of such bindings objects. In the case of an array then each object later in the array can see bindings created by objects earlier in the array.

Resource map

{
    "@id" : "uri-pattern", 
    "uri-pattern" : "value-pattern",
    ...
}

Generates a single resource with a set of property values. All the keys except @id are treated as URI patterns defining the property to attach. Value patterns can generate multiple values.

See Pattern language for an explanation of that patterns that can be expressed.

Nested hierarchy

{
    "type" : "Hierarchy",
    "parentLink" : "<org:subOrganizationOf>" ,
    "childLink" : "<org:hasSubOrganization>" ,
    "topLink" : "<skos:hasTopConcept>" ,
    "invTopLink" : "<skos:topConceptOf>" ,
    "0" : { template for top level entries } ,
    "1" : { template for second level entries } ,
    ...
}

Use this for hierarchies like organization trees or SKOS concept schemes. For any level in the hierarchy a set of columns provides the information to describe it (name, id etc). Different columns are used for different levels with a blank in the parent column. For example:

id1 name1 id2 name2
1 region 1
1a area a
1b area b
2 region 2
2a area a

Note that it's possible to structure the layout and templates so that some information is always in the same columns at every level. Just need at least one column per level to signal what level it's at.

Parameterized template

{
    "type" : "Let",
    "bind" : {"parameter-name" : "pattern", ...},
    "template" : "template-name"
}

Applies the named template in a context in which the variable parameter-name is bound to a pattern value, which may be simply a column value. This allows a template to be reused on data with different column names.

Data Cube map

TBD Give URI of a DSD in the metadata and somewhere provide the mapping from component to column.

Top level

Top level of specification can be an array of templates, all of which are required.