Home - a-f-m/japath3 GitHub Wiki

Contents


Preface

This software is based on common formal methods known from the domains of compiler construction, formal languages, and functional programming. Using apath slightly requires knowledge from the last two domains. Implementation-wise, it follows principles of these domains, usually avoiding verbosity and prioritizing compactness. The software is in beta-state but well tested and used in several java-based projects. Nevertheless, exhausted analyses or validation of apath expressions will be performed in future to avoid unintended, counterintuitive, or erroneous results.

This project could also be an inspiration for other projects and languages pursuing the same objectives described in the motivation below.

For starting or playing directly, most of the examples below are contained in the playground.

Motivation

apath is an abstract path language for hierarchical structures following the principles of xpath. It abstracts from

  • the concrete underlying structure (e. g. json, xml/html or even tree views over java object nets) by defining corresponding wrappers

Beyond that it aims at having one language for

  • selection
  • constraints & schema definition
  • construction & transformation
  • modification

Schema definitions can be

  • dependent on instance values or mixed up with arbitrary constraints
  • automatically derived from input structures.

Constructing structures easily allows for transformation definitions.

With respect to json it tries to overcome the restricted expressiveness of JSONPath and the shortcomings of mixing value and schema constraints in JSON Schema.

To some extent apath can be seen as a swiss knife for processing hierarchical structures. Therefore simple selections as well as complex, modular, and re-usable multi-line expressions can be defined. Together with external functions (e.g., Java or JavaScript) the language is Turing-complete [1].

Syntax & Semantics

The concrete syntax is defined with Parsing Expression Grammars. The detailed grammar can be found here ([2]).

Semantics follow the usual, mature approach of

  • Philip Wadler (2000). A Formal Semantics of Patterns in XSLT and XPath. Markup Languages 2(2): 183-202

and will be published soon.

Simple Walk Through

Running example:

{
    "pers": {
        "name": "Miller",
        "age": 17,
        "#post-code": 12205,
        "driverLic": true
    },
    "favs": ["coen-brothers", "dylan"]
}

According to the usual step-based semantics (e. g. xpath) an apath expression is a (.-separated) list of steps. The evaluation of a step has a node (of the underlying structure) as input (called context node) and yields a solution, a sequence of nodes. Every node of the solution is passed to the subsequent step where it is the new context node. The result of the overall evaluation is the solution of the last step.

!!! A semi-formal semantics will be stated here soon in tabular form to help understandig expression evaluation (especially passing the context node) for people that are not familiar with xpath semantics.

Basic

keyword/symbol description apath example result
______________________________________________________
access
. property selection (selectors) pers.name "Miller"
`...` ... with non-identifier chars pers.`#post-code` 12205
regex('...') ... with regex pers.regex('.*a.*') "Miller", 17
[i] subscript favs[1] "dylan"
* array items favs.* "coen-brothers", "dylan"
properties pers.* true, 12205, "Miller", 17
reflective
& (or selector) property names pers.*.& "driver Lic", "#post-code", "name", "age"
_ (or self ) current node pers.name._ "Miller"
merge
** all descendants including current node _.** {"pers":{"name":"Miller", ...}, ... }, ..., "dylan"
... in bottom up order (**^) _.**^ "Miller", 17, 12205, ...
union union of selections union( pers.*, favs.* ) true, 12205, "Miller", 17, "coen-brothers", "dylan"
comparison
eq equality pers.name.eq('Miller') true
lt, gt, le, ge inequality (less then, ...) pers.age.lt(17) false
boolean operator
and conjunction pers.age.and( true, ge(16) ) true
or disjunction pers.age.or( gt(18), le(16) ) false
xor exclusive disjunction (xor) pers.xor( age.lt(18), driverLic ) false
not negation pers.age.not( le(18) ) false
control
? filter operator pers?( age.eq(17) ).name "Miller"
cond conditional pers.cond( driverLic, age, name ) 17
opt optional (e.g. weight property not required [4]) pers?( opt(weight).lt(90) ).age 17

Enhanced

keyword/symbol description example result
______________________________________________________
pattern
match regex match on values favs.*?( match('.*dy.*') ) "dylan"
regex (first) group selection on values pers.name.match('M(.*)er').eq('ill') true
access
[#i] sequence subscript (node order i within the parent [5]) favs.*[#1] "dylan"
[#i..j] slice favs.*[#0..1] "coen-brothers", "dylan"
[#i..] open slice favs.*[#1..] "dylan"
boolean operator
type type check (primitive types) pers.age.type(Number) true
imply implication pers.imply( age.lt(18), not(driverLic) ) false
every universal quantification favs.every( *, type(String) ) true
some existential quantification favs.some( *, eq('dylan') ) true
variables
<step>$<id> variable binding ($_ stands for the selector [6] as the variable name) pers.and( name $_, age $a ) vars:[{"name":"Miller"},{"a":17}]
$<id> variable access (...eq($x)...) pers.name$x.eq($x) true
variable access 1 ($ w/o id stands for the root of input structure) pers.union(name, $.favs[1]) "Miller", "dylan"
control
{...} sub-expression ([12], cf. struct creation below) _{ pers.name $_ }.favs[1] "dylan"
vars:[{"name":"Miller"}]
re-use
def user-defined (parametric) expression [7], [8] with parameter numbers def(either, and(or(#0, #1), not(and(#0, #1))) )
with parameter names def(either(first, second),
   and(or($first, $second),
         not(and($first, $second))) )
expression application def(either,...)
 .pers.either( age.lt(18), driverLic )
false
construction
new creation of a new node new $x
: assignment (<lhs> : <rhs>) new $age : 99 99
vars:[{"age": 99}]
{...} struct creation ([9], [12]) new $x : {age: 99} {"age": 99}
vars: [{"x": {"age": 99}}]
[...] array creation (...) {num: [99, 100]} {"num": [99, 100]}
directives & functions
::<directive>((...))? directive pers{::complete}
j::<namespace>::<func>(...) predefined external java method pers.name.eq( j::str::conc('Mi', 'ller') ) true
def-script predefined javascript function (first parameter 'self' is the context node) def-script("""
 function canDrive(self, tag) {
   return (self < 18 ? 'avoid ' : 'do ') + tag
 }
""").
 pers.age.js::canDrive('it')
"avoid it"

Salient Mode

Selector names corresponds to properties (in json) or element names (in xml/html). If an expression contains a selector that is not contained in the input object, an empty solution is retrieved, which allows for optional branches in expressions. But sometimes typos are contained which yield unintended empty solutions. For test purposes the salient mode can be enabled. For instance, if, in salient mode, the input is

{"x":{"y":1}}

and the expression is x.yy then the error

 salience: selectors [yy] used but not found (available selectors: [x,y])

will be returned.

Remark: Currently, salient mode does not check xml attributes.

Constraints & Schema

Constraints

Constraints are boolean expressions that have to be evaluated to true. For instance, the expression

pers.assert(
    imply( age.lt(18), not(driverLic) ),
    `#post-code`.and( ge(13000), le(99999) )
)

evaluates to false and yields the following violation message in constraints mode [10] (for constraints mode see help in section Service below: curl --location --request POST http://localhost:8082/apath/help ):

// Constraints violated. Possible corrections ('← ...') below
{
  "pers": {    // ← `#post-code`.assert(ge(13000),le(99999))
    // ← imply(age.lt(18),not(driverLic))

    "name": "Miller",
    "age": 17,
    "#post-code": 12205,      // ← ge(13000)

    "driverLic": true
  },
  "favs": [
    "coen-brothers",
    "dylan"
  ]
}

Text Message

To state user-friendly, customized violation messages, one have to use assertions with extra textual messages. Those must have the form

assert(message(<text>), <expr1>, ... , <exprN>)

where <text> is the custom text and <expr1>, ... , <exprN> a list of constraints. Whenever one of the constraints is violated, the custom message is displayed. E.g., the constraint

pers.and(
    assert( message('aged <18 cannot have a driver lic'),
        imply( age.lt(18), not(driverLic) )),

    `#post-code`.assert( message('must be >=13000 and <=99999'), 
        ge(13000), le(99999))
)

that is semantically equivalent to the one above, yields the violation

// Constraints violated. Possible corrections ('← ...') below
{
  "pers": {    // ← aged <18 cannot have a driver lic

    "name": "Miller",
    "age": 17,
    "#post-code": 12205,      // ← must be >=13000 and <=99999

    "driverLic": true
  },
  ...
}

Schema

From the theoretical viewpoint schemata are specialized constraints. The automatically generated schema for the running example is

assert(
	pers.assert(
		driverLic.type(Boolean),
		`#post-code`.type(Number),
		name.type(String),
		age.type(Number)),
	favs.every(*, type(String)))

Here, by default, all properties are stated as mandatory. If optionality is required the opt keyword has to be used, e. g. opt(driverLic).type(Boolean) (see keyword table above). Schema generation takes structure variance within arrays into account. Concretely, in case of "favs": ["coen-brothers", 42] the generated constraint would be

  favs.every(*, or( type(String), type(Number))))

Schema modularization can be done via the def-keyword and helps to reuse sub schemata comparable to Object Oriented Modeling techniques (e. g. inheritance could be mimicked thereby). For instance, the following shows the extra handling of a Person-sub-schema:

def(PersonSchema, 
  assert(
    driverLic.type(Boolean),
    ... <see above> )).

assert(
  pers.PersonSchema(),
  favs.every(*, type(String)))

Mixing

One essential advantage to have one language, besides the fact that learning effort is minimized, is mixing pure schema and value constraints:

def(personCheck, assert(
    imply( age.lt(#0), not(driverLic) ),
    `#post-code`.assert( ge(13000), le(99999) )
)).
assert(
  pers.and(
    personCheck(18),
    PersonSchema()),
  favs.every(*, type(String)))

Moreover, mixing enables that pure schema constructs, e. g. type(..), could be enforced in dependence of input values. For instance, the following schema requires the presence of the license label of type string if the person has a driver license

pers.imply(driverLic, assert(license.type(String)))

For our running example the message

{
  pers: {   ← license.type(String)
    driverLic: true,
    ...

will be stated.

Completeness

Completeness of an input structure means that properties not contained in the schema must not occur in the input structure. It could be possible with the apath keywords so far to enforce completeness, but it would be less readable an too verbose. Therefore the directive ::complete is provided:

...
and(
  pers{::complete}.and(
    ... ),
  ... )

e. g., if the pers property of the running example would have the property weight then the violation

pers: {   ← selectors [weight] not covered by schema

would occur.

Completeness can also be checked globally because this directive is handled as an ordinary step. E. g., to have the completeness constraint for the whole input one have to state **.::complete . Again, it could also depends on values of the input instance.

The expression in Appendix (a) covers all constraints introduced so far.

Construction & Transformation

Construction

[11]

The basis for transformations are features for constructing new structures. Construction is a combination of creation, struct-expressions ({...} [12]), and assignments. Creation is a step with keyword new (usually followed by a variable binding) and assignments are paths suffixed with : expr. In order to enabling reuse of json structures we adopt the json syntax - in fact, json syntax is a subset of apath syntax. The following example binds a new structure to variable x.

new $x : {
    "pers": {
      "name": "smith, john",
      "age": 22
    },
    "favs": ["dylan", "dePalma", "tarantino"]
}

The flexibility of apath can be used when constructing, e. g. the following expression is equivalent to the above one.

new $x : {
  pers.name: 'smith, john', pers.age: 22,
  favs: ['dylan', 'dePalma'], favs[2]: 'tarantino'
}

Flatten

Often used in database context, when flat structures are preferred, the following example builds the flat name property with the running example as input

_{new $name} // var for the name
// the following structure is returned
.{
    // the property name is build with the selector path
    asProperty( $name.selectorPath() )
      : 
        // name is bound
        pers.name $name 
}

and the result {"pers.name": "Miller"}. To generally flatten a structure, the expression

_{new $flat} // var for the result
._{
    // iterate all leafs
    **?(isLeaf()) $leaf
    // bind the selector path (*)
    .selectorPath() $path
    // extent $flat with the flat property name 
    ._{$flat.asProperty($path): $leaf}
}
// return the flat structure
.$flat

can be used and has the result

{
   "pers.name": "Miller",
   "pers.age": 17,
   "pers.#post-code": 12205,
   "pers.driverLic": true,
   "favs.0": "coen-brothers",
   "favs.1": "dylan"
}

To restrict the flattening to properties name and age, one can use the expression .selectorPath()?(match('.*\.age|.*\.name')) $path instead of part (*) above.

Transformation

Essentially, transformation is construction using values selected from input nodes. Let us take the structure assigned to $x above as input. Known from usual mapping scenarios, a 1:n and n:1 mapping should be supported. (Output-driven) transformations, as known from xslt, define the new structure and assigns values of the input node. The following expression

new $y : {
    "surname": pers.name.match('(.*),.*'),
    "firstname": pers.name.match('.*,\s*(.*)'),
    "userName": j::str::conc($.favs[0], pers.age.text())
}

defines surname and first name by accessing the name property of the input (1:n mapping). The fourth line defines the user name by concatenating the top favorite and the age (n:1 mapping). It yields the output

{
   "firstname": "john",
   "surname": "smith",
   "userName": "dylan22"
}

Sequence Transformation

Because sequences naturally correspond to arrays in json context, one can easily transform them, e. g. by means of *. This is also called list comprehension in functional languages. For instance, in Haskell and Python, one would write [v | v <- ...] and [v for v in ...], respectively. The following example (again with input structure $x)

new $y : {
    "favorites": [ 
        favs.*. { "top": &, "fav" : _ }
    ]
}

iterates over favs and constructs the struct items with the input array item itself (_) and its selector (&).

Result:

{"favorites": [
   {
      "top": "0",
      "fav": "dylan"
   },
   {
      "top": "1",
      "fav": "dePalma"
   },
   {
      "top": "2",
      "fav": "tarantino"
   }
]}

The expression in Appendix (b) shows the whole transformation making use of modularization, comparable to templates in xslt.

Modification

Orienting towards the declarative, functional character of languages, especially xpath, modification of input structures as a side effect is not prioritized. One should prefer creating new structures. But in some cases, e.g. for efficiency reasons, modification can be performed also. To avoid unintended modification the directive ::modifiable must be used to explicitly mark the input structure root as modifiable [13].

Modifications are assignment expressions over the input structure where the properties and subscripts (selectors) within the lhs (left hand side) of the assignment are created on demand if they do not exist. The rhs (right hand side) can be arbitrary expressions. Let the input structure be the running example.

For instance, only modification (no extension) is done with the following sub-expression [12] (note that _ is a syntactic short form for self, in this case the input structure itself)

_{::modifiable, pers.age : 18, favs[0]: "tarantino"}

which redefines the age and the top favorite. The expression

_{::modifiable,
    pers.info : 
        pers.cond(and(age.lt(18), driverLic),
            "failure: driver lic not allowed", 
            "success: driver lic allowed")
}

extends the pers sub-structure of the input structure by the info property whose value depends on other properties (cf. section constraints), with result { ... "info": "failure: driver lic not allowed" ... }

Of coarse, assignments could be embedded within complex expressions. E.g.,

_{::modifiable,
    pers. 
        cond(and(age.lt(18), driverLic),
            failure: "driver lic not allowed", 
            success: "driver lic allowed")
}

The effect is nearly the same as above except that now there are two different properties (note that the context node of the cond-arguments is pers). Result: { ... "failure": "driver lic not allowed" ... }

Playground

Download https://github.com/a-f-m/japath3/raw/main/japath3-playground.zip and unzip to directory (called D in the following). Go to directory D.

java -Dfile.encoding=UTF-8 -cp target/japath3-playground.jar service.PlayMainApath

Remark: for now it only runs on port 8085. To start the UI execute http://localhost:8085/home/index-local.html in your browser.

For testing purposes, the console output of the playground can be used as request input for the service below.

Cli

For download see playground above. Go to directory D.

Besides the usage in java (programmatic), it is shipped with a command line interface. For instance, the command

 echo {"x":{"y":1}} | java -Dfile.encoding=UTF-8 -cp target/japath3-playground.jar japath3.cli.Commands --op select --stdin -apathExpr "x.y"

returns the result 1. Execute java -Dfile.encoding=UTF-8 -cp target/japath3-playground.jar japath3.cli.Commands --help for help. Cli is only recommended for test purposes due to the time-consumptive start of the java VM and load of the graal engine. Because of caching it is recommended to use the Service below.

Service

For download see playground above. Go to directory D.

A simple http service (which is integrated in the playground for now) can be started with

java -Dfile.encoding=UTF-8 -cp target/japath3-playground.jar japath3.cli.Commands --service <port>

to allow for be called from other services. <port> must not be 8085 if the playgound above runs. Then, for help, execute curl --location --request POST http://localhost:8082/apath/help. The service follows the functionality of the Cli. For instance, the above selection (Cli) with result 1 can be equivalently performed with

curl --location --request POST 'http://localhost:8082/apath/eval' \
--header 'Content-Type: application/json' \
--data-raw '{
   "_op": "select",
   "type": "json",
   "_body": {"x":{"y":1}},
   "apathExpr": "x.y",
   "salient": true
}
'

Remark: if "type":"xml" is set then the value of property _body have to be a string containing the xml/html input. For now xml is in experimental state!

to be cont.

Programmatic Usage

All examples above can be tried out in the playground. To use it in a project, one way is to call the service (see above) via http. But if your host language is java, it is preferred to use apath in a programmatic way.

Use the jar-file japath3-playground.jar (see section playground)

In the following we reuse the examples introduced so far embedded in java code. Again, we use our running example.

Wrapping Up

Let

{
    "pers": {
        "name": "Miller",
        "age": 17,
        "#post-code": 12205,
        "driverLic": true
    },
    "favs": ["coen-brothers", "dylan"]
}

be the json string assigned to a java string variable jo. Wrapping up jo is done with

Node node = WJsonOrg.w_(jo);

Here we use the wrapper for org.json.*.

Evaluation

Single Selection

Selecting a single node resp. its value is done with

PathExpr expr = Language.e_("pers.name");

Node u = Japath.select(node, expr);
String v = u.val();

System.out.println(u);
System.out.println(v);

where node is the wrapped up json above. The first line builds an internal apath expression. The second line evaluates this expression over node and retrieves a single node whose string value is accessed in the third line. The output will be

`name`->Miller
Miller

To be more non-verbose the above could be written as

String v = select( w_(jo), e_("pers.name") ).val();

making use of java import static ... for the corresponding methods. We selected a single primitive value so far. Retrieving a json (sub-) object is performed with

Node person = select(w_(jo), e_("pers"));
Node name = select(person, e_("name"));

System.out.println(name.val().toString());

JSONObject joPerson = person.val();

System.out.println(joPerson);

yielding output

Miller
{"name":"Miller","age":17,"#post-code":12205,"driverLic":true}

The first line in the java snippet selects the non-primitive person node which in turn is used as the input node for the name selection in the second line. The fourth line retrieves the underlying de-wrapped json object. As we see in this example every node contains the wrapped json object.

Rem.: The local wrapping-up during evaluation is performed on demand to save execution time.

Multiple Selection

The select method is only used for single selection. To retrieve multiple solutions the method walki has to be used.

Iterable<Node> nodes = Japath.walki( w_(jo), e_("union( pers.*, favs.* )") );

for (Node x : nodes) System.out.println(x.val().toString());

that yields output

Miller
17
12205
true
coen-brothers
dylan

apath offers the stream variant walks for making use of java stream methods. E.g., the snippet

walks( w_(jo), e_("union( pers.*, favs.* )") ) //
    .filter(x -> {
      return x.val().toString().matches(".*(ill|bro).*");
    })
    .forEach(x -> System.out.println(x.val().toString()));

yields output

Miller
coen-brothers

Of coarse, this example can be realized with apath expressions itself, e.g.

walki( w_(jo), e_( "union( pers.*, favs.* ) ? (match('.*(ill|bro).*') ) ") ) //

    .forEach(x -> System.out.println(x.val().toString()));

yields the same output.

Rem.: For more complex multi-line apath expressions it is recommended to use a separate file. Then the method Language.e_(s) will be called after reading the file content to s. Another way in java16 is to use multi-line strings with """. To handle expressions in a clean manner so-called modules are introduced below.

Modules

<tbd>

Appendix

(a)

Input:

{
   "pers": {
      "name": "Miller",
      "age": 17,
      "#post-code": 12205,
      "driverLic": true,
      "weight": 90
   },
   "favs": [
      "coen-brothers",
      "dylan"
   ]
}

apath expression:

def(PersonSchema, 
  assert(
    driverLic.type(Boolean),
    `#post-code`.type(Number),
    name.type(String),
    age.type(Number) )).

def(personCheck, assert(
    imply( age.lt(#0), not(driverLic) ),
    `#post-code`.assert( ge(13000), le(99999) )
)).

assert(
  pers{::complete}.assert(
    personCheck(18),
    PersonSchema()),
  favs.every(*, type(String))
)

Result:

// Constraints violated. Possible corrections ('← ...') below
{  // ← pers {::complete()}.assert(personCheck(18),PersonSchema())

  "pers": {    // ← selectors [weight] not covered by schema
    // ← personCheck(18)
    // ← `#post-code`.assert(ge(13000),le(99999))
    // ← imply(age.lt(#0),not(driverLic))

    "name": "Miller",
    "age": 17,
    "#post-code": 12205,      // ← ge(13000)

    "driverLic": true,
    "weight": 12
  },
  "favs": [
    "coen-brothers",
    "dylan"
  ]
}

(b)

Input:

{
    "pers": {
      "name": "smith, john",
      "age": 22
    },
    "favs": ["dylan", "dePalma", "tarantino"]
}

apath expression:

def(personal,
    {
      "surname": #0.match('(.*),.*'),
      "firstname": #0.match('.*,\s*(.*)'),
      "userName": j::str::conc(#1, pers.age.text())
    }
).
def(favorites,
    [ 
        #0.*. {"top": &, "fav" : _}
    ]
).

{ 
    "personal": personal($.pers.name, $.favs[0]),
    "favorites": favorites($.favs)
}

Result:

{
   "favorites": [
      {
         "top": "0",
         "fav": "dylan"
      },
      {
         "top": "1",
         "fav": "dePalma"
      },
      {
         "top": "2",
         "fav": "tarantino"
      }
   ],
   "personal": {
      "firstname": "john",
      "surname": "smith",
      "userName": "dylan22"
   }
}

Footnotes

[1]

Using it as a service (see corresponding section) can be seen as a fully programmable unit providing all functionality used for manipulating hierarchical structures like json.

[2]

For now basic parser functionality is supported. For instance, the expression a.type('hi') yields the error message "user-defined expression 'type' not found" because the string parameter 'hi' does not match a predefined type keyword and the parser proceeds to the grammar rule for user defined-expressions. Future versions will provide exhausted analyses & validation.

[3]

Attention! if you want to match

[4]

If selection does not succeed, steps behind are not evaluated. In context of boolean evaluation it yields true.

[5]

Attention! does not denote the evaluation order, e. g. after filter-apply; mostly used for xml/html where no first-class array exists, or in json context directly after *.

[6]

Attention in case of non-property selectors of the context node.

[7]

Note that parameters within the definition are substituted by the parameter expressions of the call before evaluation. Parameter numbers (#i) are recommended for short expressions if the meaning of the parameter is obvious. Both ways are justified.

[8]

If expressions change often, e.g. by introducing another parameter, renumbering is error prone and named parameters should be used. Internally, def's with named parameter lists def( f(... pi ...), ... $pi ... ) are transformed to def( f, ... #i ... ).

[9]

If the evaluation of the rhs expression has no result then nothing is constructed. E.g., if the input is {a: 1} and the construction is {x: a, y: b} then the result is {x: 1}.

[10]

In constraint mode one can use keyword assert instead of and. Semantics are equivalent, apart from annotation handling for violation text output.

[11]

For now construction/transformation is not supported for xml/html.

[12]

Attention: {...} is overloaded with sub-expressions (grammatically comparable with overloading of {...} in java for array initialization as well as blocks). A sub-expressions follows a step immediately whereas a struct is itself a step (see grammar).

[13]

It has to be done carefully, because of the side effects the results depend on evaluation order. Semantically, a modifiable node is equivalent to a created one (see section construction).

⚠️ **GitHub.com Fallback** ⚠️