Tips and Tricks - epimorphics/dclib Wiki

Inserting Line-Feeds in generated literals

;tldr

Use of inline '\n' and '\r' characters work in YAML/JSON literals, but not in DCLIB pattern expressions i.e. "{ pattern-expression }".

Unicode literals '\u000a' and '\u000d' used in place of '\n' and '\r' respectively DO work as expected in all cases.

Detail

Inserting line breaks i.e. '\n' characters into 'statically' created pattern values is straight forward:

bind : 
 - var1 : "line one\nline two"      # works with YAML/JSON escaping of `\n` 
   var2 : "line one\u000aline two"  # works with YAML/JSON escaping unicode equivalent of `\n` i.e. `\u000a`

However in the dynamic insertion of line-feeds (\n) into dynamically constructed content is less straight-forward.

Firstly, it is not possible to create a variable using simple literal patterns consisting of line-feed (or carriage return, or indeed a CRLF) because pattern evaluation trims leading and trailing whitespace (which includes CRs and LFs)

bind :
  - cr     : "\r"
    cr_alt : "\u000d"
    lf     : "\n"
    lf_alt :  "\u000a"

All of these bindings result in the corresponding variables bearing an empty string value.

However, it is possible line-feed or carriage return character to a variable using JEXL evaluation:

bind : 
  - cr   : "{value('\\u000d')}"
    lf   : "{value('\\u000a')}"
    crlf : "{value('\\u000d\\u000a')}

which yields variables that can be used in pattern based composition e.g.:

bind : 
  ...
  - composite : "{line1}{lf}{line2}"

This works because trimming occurs on the pattern expression before it is evaluated. Note that this approach will also work without the value(...) wrapping of the literal strings, but this will yield plain JEXL variables that do not have the associated pattern library Value fields and methods. i.e.

bind : 
  - cr   : "{'\\u000d'}"
    lf   : "{'\\u000a'}"
    crlf : "{'\\u000d\\u000a'}

Unicode escape sequences (ie. \u000a and \u000d) can be used directly in JEXL scripted patterns - note that the YAML surface syntax requires an additional, escaping, '\' in order to pass a single '\' character down to the JEXL script parser.

bind : 
   var2 : "{ line1.append('\\u000a').append(line2)}" # Succeeds inserting '\n' between line1 and line2
   var3 : "{ line1.value + \\u000a + line2.value }"  # Succeeds... the '.value' accessors on the DCLIB variable reaches the underlying java String and avoids a type coercion warning.

However, neither \n or \r work as expected. If a double '\' sequence is used to feed a single '\r' or '\n' to the JEXL parser, a three character sequence '\\n' is rendered in the generated RDF turtle output. An RDF parser will interpreted this as two characters, a single literal '\' followed by a literal 'n' rather than as a single '\n' line-feed character.

Attempting to remove the extraneous '\' from output by omitting the leading (escaping) '\' and using a single backslash results in a JEXL parser failure.

bind : 
   var1 : "{ line1.append('\\n').append(line2)}" # Fails by generating '\\n' in the output value rather than `\n`.
   var4 : "{ line1.append('\n').append(line2)}"  # Generates parser failure... "Caused by: org.apache.commons.jexl2.parser.TokenMgrError: Lexical error at line 1, column 18.  Encountered: "\n" (10), after : "\'""