Pattern language - epimorphics/dclib GitHub Wiki

Patterns are used to express computed values that can be bound to other variables (e.g. in a parameterized template) or given as the value of RDF properties (in a Resource map).

Value patterns

URI pattern:

"<http://example.com/{pattern}/{pattern}>"

If, after pattern substitution, this looks like a qname then any prefixes will be expanded.

Reverse property pattern:

"^<http://example.com/{pattern}/{pattern}>"

Literal pattern:

"I {pattern} am a literal"
"lang {pattern} string@en"
"non-lang string@@en"
"non-lang string\\@en"
"foo^^xsd:string"
"foo^^<http://example.com/literal>"
"not typed string\\^^rest of string"

Any of the special characters ('<', '>', '^', '{', '}') may be used in a plain string by means of the '' escape character. To put a '' in a JSON string you need "\" because \ is also used at the JSON level. So for example:

"\\<foo>"

is an plain literal "<foo>" not a URI resource.

The backslash escaping only applies to those special characters, in other cases it appears in the string as a \. This is useful when defining things like regular expressions.

Patterns expressions

The {pattern} parts of the above examples are all expressions which use jexl syntax, such as:

column
'string'
42
object.method(expr, ...)
fn(expr,expr,...)

In the simple cases there are Jexl expressions.

It is also possible to create Jexl scripts as a nested block:

"my { { statement; ... statement } } script"

Scripts are easy to write when using the YAML surface syntax.

Values

The values which can be accessed in the expression are provided from:

  • global context values
  • the cell values from the current row of the input file
  • values bound by can calling templates

In all cases the values are normally wrapped objects which support some useful method operations listed below.

The source value of a cell will be automatically coerced to numbers or dates if they syntactically look like the corresponding type. To override this use an explicit asString() call in the pattern. This processing follows the linked CSV conventions and supports:

  • xsd:integer [0-9]+
  • xsd:decimal [0-9]+\\.[0-9]+
  • xsd:double [0-9]+(\\.[0-9]+)?[eE][-+]?[0-9]+(\\.[0-9]+)?
  • xsd:dateTime
  • xsd:date
  • xsd:time
  • xsd:gYearMonth

The expression evaluator can also handle raw java values. To extract the raw java value from a wrapped object use x.value. So, for example, to use arithmetic on a cell that looks like a number you would use a pattern such as {x.value + 1}.

Column names are normalized, fold to lower case, replace white space sequences and punctuation with "_".

Global values

Value Meaning
$row Object which allows access to row number and to bNodes corresponding to the row (see below)
$base If set this is the base URI for resources generated in the conversion, it is up to the calling environment to supply this if needed
$dataset If $base is set then this is an RDF resource corresponding to that base URI
$filename The name of the source file being converted (just local file name, no path component)
$filebasename The base name of the file being converted, this is normally the file name with the extension and any timestamp stripped. In a webapp setting this is typically the file supplied by the uploading browser
$root In a resource map template this is an RDF resource corresponding to the @id for the current row
$exectime A date-timestamp for the overall execution, remains constant throughout the conversion, useful for e.g. setting dct:modified values

Missing values

Often source data has empty values for certain columns.

The default processing is that a missing value simply results in values derived from it being missed from the results, but the rest of the template still applies. For example, given data:

label notation description
entry1 1 A description for entry 1
entry2 2

Then a template:

{
    "name" : "simple-skos",
    "@id" : "<{$base}{notation}>",
    "<rdf:type>" : "<skos:Concept>",
    "<skos:notation>" : "{notation}",
    "<dct:description>" : "{description}",
    "<skos:prefLabel>" : "{label}"
}

will generate output like:

<http://example.com/1>  a  skos:Concept ;
    skos:notation "1";
    skos:prefLabel "Entry 1;
    dct:description  "Description for 1" .

<http://example.com/2>  a  skos:Concept ;
    skos:notation "2";
    skos:prefLabel "Entry 2" .

Whereas if the notation value were missing then there would be no output for that row because the resource URI could not be constructed. For safety, if no template successfully generates applies to a given row then the whole process is deemed to have failed.

Using the required field you can force the template to explicitly test that a value is present for the given row before applying the template. So if description were required then the template would not apply to the second row in this example and the processing would fail unless there were any alternative template which did match.

Value methods

String bashing

name action
x.append(y) concatenate two strings
x.toLowerCase() map case of a string
x.toUpperCase() map case of a string
x.toSegment() convert string to safe URI segment (uses '_' in place of punctuation)
x.toSegment('x') convert string to safe URI segment, as toSegment() but then replaces '_' by the given string
x.toCleanSegment() variant on toSegment('-') which converts to lower case, strips ' characters, reduces remaining sequences of punctuation to a single '-' and strips any trailing "-"
x.format('%05d') format the underlying java value as string using the formatting commands from java.util.Formatter
x.split(',') split into an array of values where regex? matches
x.lastSegment(p) last segment of of URI or URI shaped string
x.substring(start) sub string which omits the first start characters
x.substring(start, end) sub string from start to end index
x.regex(regex) first capture group of a regex
x.matches(regex) true if the regular express matches the lexical form of the value
x.replaceAll(regex, replacement) replace all occurrences of a regular expression in the lexical form
x.digest('alg',base64) computes a digest string for the string value of x.
'alg': an optional string valued parameter that carries the name of the requested digest algorithm, see java.security.MessageDigest. Default value 'MD5'
base64: an optional boolean valued parameter. true requests a base64 encode result, false request a hexadecimal encoded result. Default value: false.

Coercions

name action
x.asString() coerce to string
x.toString() coerce to a raw java string
x.asNumber() coerce to number (or report an error if this is not possible)
x.asDecimal() coerce to number (or report an error if this is not possible) and then coerce the number to be a decimal (i.e. converts both integers and floats to decimals)
x.asBoolean() coerce to (raw) boolean
x.asDate('xsd:type') coerce to a date or time of the given type - supports xsd:dateTime, xsd:date, xsd:time, xsd:gYearMonth, xsd:gYear
x.value return the underlying raw java value for a wrapped value
x.datatype('xsd:short') return x as an RDF typed literal whose type URI is the given string, prefix expansion is done and xsd: is always support as a default prefix
x.lang('en') coerce to a language tagged literal (using trailing @en may be easier)

Date handling

name action
x.asDate('format', 'xsd:type') parse a value as a date or time, the range of xsd types supported is as above, the format string can be a series of format options separated by |, each format option follows the Joda Time conventions
x.format('pattern') renders a date time as a string according to the given pattern using the Joda Time conventions
x.year x.month x.day x.hour x.minute x.fullSecond x.second access the components of a date or date time. The fullSecond is the integer number of full seconds whereas the second value is a decimal including fractional seconds
x.plusYearDays(years,days)
x.minusYearDays(years,days)
add or subtract a number of days or years to a date time
x.plus(hours, minutes, seconds) x.minus(hours, minutes, seconds) add or subtract a number of hours, minutes and seconds to a date time
x.toWholeSeconds() return the date time with any fractional seconds removed
x.toLocalTime() Converts the date time to the timezone of the current locale and returns a local date time preserving the instant. For example, running a date time of 2014-09-24T14:20:30Z in a locale which is 1 hour ahead of UTC will give a date time 2014-09-24T15:20:30.
x.toLocalTime('zonename') Converts the date time to the timezone of the named locale and returns a local date time preserving the instant. The locate must be a long form time zone string such as Europe/London.
x.referenceTime() for a date or dateTime this a injects a set of output triples describing the corresponding day, month and year using the reference time ontology and URI set, it returns the date value as a reference time resource (an instant for a xsd:dateTime, a day for a xsd:date or a year for a xsd:gYear)
x.referenceTimeWeek() for a date or dateTime this a injects a set of output triples describing the corresponding day, month and year using the reference time ontology and URI set, it returns the week containing this date as a reference time resource
x.diffMilliSeconds('other') Returns the number of milliseconds between two date/dateTime values 'x' and 'other'.
x.diffWholeDays('other') Calculates the number of whole days between the start of two dates or dateTimes, 'x' and 'other'.

Geo support

name action
fromLatLon(latvar, lonvar) Construct a geo point from a WGS84 lat/lon pair of input values
fromLatLonRaw(55.5, -1.5) Construct a geo point from a WGS84 lat/lon pair immediate values
fromEastingNorthing(Evar, Nvar) Construct a geo point from an OS easting/northing pair of input values
fromEastingNorthingRaw(651234, 512345) Construct a geo point from an OS easting/northing pair immediate values
fromGridRef(gridrefvar) Construct a geo point from a grid reference input string
fromGridRefRaw('NU 29157 23009') Construct a geo point from a grid reference immediate string
point.easting Return the Easting value of a geo point
point.northing Return the Northing value of a geo point
point.lat Return the WGS84 lat value of a geo point
point.long Return the WGS84 lon value of a geo point
point.gridRef Return the OS grid reference for a geo point

Row variables

name action
$row.number row number
$row.uuid uuid corresponding to row
$row.bnode blank node corresponding to row
$row.bnodeFor(p) blank node for this row identified by pattern. A pattern can be a simple string.

Mapping and reconcilation

name action
x.map('source') convert to a URI by lookup the best (first) fit value for given key in source map
x.map('source', true) convert to a URI by lookup the best (first) fit value for given key in source map, a valid result is required if the match can't be find then that's an error
x.map('source', 'var', true/false) convert to a URI by lookup the best (first) fit value for given key in source map, the returned result will be the secondary value var (in the case of an RDFSparql source then this is the name of the SPARQL variable whose binding is to be returned
x.map(['source1', ... 'sourcen'], default) convert to a URI by lookup the best (first) fit value for given key in sequence of source maps, if it fails to match in any of them then returns the default value (which can be nullValue() or abort())
x.mapToAll('source') convert to a set of URIs by getting all mapped values for given key in source map

Global functions

name action
round(2.7) round a number to an integer
nullValue() The result is treated the same way that a missing value in a data cell is treated (see below)
value(expr) Wrap a plain value for further scripting
print(expr) Log the value of the expression to the execution trace and return it. Useful for template development to check whether a value is what you expect
abort() Abort the current pattern evaluation and template
asResource('dct:') Convert a string to a RDF Resource node, the string will have any prefixes expanded
bnodeFor(key) Create or return a bnode for a given row independent key

If working with raw values (e.g. {'foo' + bar}) then the result will be converted to an RDF literal. For additional control over this there are two global functions:

Function Use
lang(value, 'en') Convert a raw result into a language-tagged literal (using trailing @en may be easier)
datatype(value, 'xsd:integer') Convert a raw result in a typed literal of the given xsd: type

RDF fetch and processing

name action
x.fetch() treats x as a URI, fetches that as RDF and injects and returned statements into the output graph
x.fetch('prop', ... 'prop') fetches the RDF from x as above but only injects the values of the given properties into the output graph

Sometimes we need to test or access the RDF data already generated, for example when data has been fetched from a remote resource. This is only possible in the converter is working in non-streaming mode.

To help in this situation the following methods are supported (note that it is no longer necessary to further wrap the values using asRDFNode() to access these functions). In each case where a property values is required this may be given by a uri string, a qname string (using the prefixes defined for the conversion) or another value that is an RDF node.

Method Returns
string.asRDFNode() Treat a string as a URI and convert it to an RDF Node value
r.connectedNodes(path) A list of nodes connected to this one via the given SPARQL path expression
r.getPropertyValue(prop) Get a single value for the property
r.listPropertyValues(prop) A list of all values of the property
r.listProperties() A list of property value pairs, each entry in the list is an object with a .prop and .values bean method
r.listInLinks(prop) A list of a nodes which link to this one via the given property
r.listInLinks() A list of property value pairs giving the property and resource(s) which link to this one
r.isResource() Test if this node is a RDF resource
r.isLiteral() Test if this node is a literal
r.isList() Test is this node is an RDF list
r.asList() Return the values in the RDF list as a plain Java list of wrapped RDF nodes
r.datatype The data of the literal (if this is a literal)
r.language The language tag of the literal (if this is a lang-tagged literal)
r.lexicalForm The lexical form of the literal (if this is a literal)
r.name A name for the node. For a literal this will be its lexical form. For a resource this will be the best match out of a set of possible label/name properties.
r.uRI or r.getURI() The URI of the resource (if this is a resource)
r.hasResourceValue(prop,val) Test if the value of the given property is a resource with the given URI, the val can be an qname style abbreviated URI
r.addPropertyValue(prop,value) Add the given property value to node r. prop may be a string or a variable and is subject to prefix expansion; value may be a string or a variable and is not subject to prefix expansion.
r.addObjectPropertyValue(prop,value) Add the given property value to node r. Both prop and value maybe strings or variables. Both are subject to prefix expansion so that prefixed names may be used to provide specify RDF URI nodes for value.

As an example here is a (Yaml format) template which fetches OS district or county URIs and returns the corresponding County an rdf:value annotation:

name              : RDFNode processing test case
required          : ["id" ]
"@id"             : <{id.fetch()}>
<rdf:value>       : |
    { {
        var x = $root.asRDFNode();
        if (x.hasResourceValue('rdf:type','http://data.ordnancesurvey.co.uk/ontology/admingeo/County')) {
           return x
        } else {
           return x.getPropertyValue('http://data.ordnancesurvey.co.uk/ontology/admingeo/inCounty')
        }
    } }

Functions

It is possible to create small, reusable single-argument functions to reduce repetition in templates.

This is done by binding a variable to a pattern of the form:

{= expr; expr; ...; expr}

where the expressions are Jexel statements forming a script. These can refer to a variable $$ which will be passed in from when the function is applied. The value of the function application will be the value of the last expr or the value of any return statement.

To use this function then within a later pattern it can be called using:

{ f.apply(x) }

where f is the variable t which the function was bound and x is the value to be passed to the function. The environment visible to the function will be that in place at the time the function was created plus the calling argument $$.

For example:

name : Test emission of namespace URIs
type : Composite
bind : 
   - "$base" : "<http://example.com/test>"
   - "$ns" : "{= asResource($$).replaceAll('[#/]$','')}"
templates :
   - "@id" : "{$base}"
     "<rdf:type>"  : "<void:Dataset>"
     "<rdf:value2>" : "{$ns.apply('dct:')}"

This creates a reusable function ns which expands a namespace string to a URL for a vocabulary (with trailing # or / removed).

⚠️ **GitHub.com Fallback** ⚠️