Home - a-f-m/japath3 GitHub Wiki
- Contents
- Preface
- Motivation
- Syntax & Semantics
- Simple Walk Through
- Constraints & Schema
- Construction & Transformation
- Modification
- Playground
- Cli
- Service
- Programmatic Usage
- Appendix
- Footnotes
This software is based on common formal methods known from the domains of compiler construction, formal languages, and functional programming. Using apath slightly requires knowledge from the last two domains. Implementation-wise, it follows principles of these domains, usually avoiding verbosity and prioritizing compactness. The software is in beta-state but well tested and used in several java-based projects. Nevertheless, exhausted analyses or validation of apath expressions will be performed in future to avoid unintended, counterintuitive, or erroneous results.
This project could also be an inspiration for other projects and languages pursuing the same objectives described in the motivation below.
For starting or playing directly, most of the examples below are contained in the playground.
apath is an abstract path language for hierarchical structures following the principles of xpath. It abstracts from
- the concrete underlying structure (e. g. json, xml/html or even tree views over java object nets) by defining corresponding wrappers
Beyond that it aims at having one language for
- selection
- constraints & schema definition
- construction & transformation
- modification
Schema definitions can be
- dependent on instance values or mixed up with arbitrary constraints
- automatically derived from input structures.
Constructing structures easily allows for transformation definitions.
With respect to json it tries to overcome the restricted expressiveness of JSONPath and the shortcomings of mixing value and schema constraints in JSON Schema.
To some extent apath can be seen as a swiss knife for processing hierarchical structures. Therefore simple selections as well as complex, modular, and re-usable multi-line expressions can be defined. Together with external functions (e.g., Java or JavaScript) the language is Turing-complete [1].
The concrete syntax is defined with Parsing Expression Grammars. The detailed grammar can be found here ([2]).
Semantics follow the usual, mature approach of
- Philip Wadler (2000). A Formal Semantics of Patterns in XSLT and XPath. Markup Languages 2(2): 183-202
and will be published soon.
Running example:
{
"pers": {
"name": "Miller",
"age": 17,
"#post-code": 12205,
"driverLic": true
},
"favs": ["coen-brothers", "dylan"]
}
According to the usual step-based semantics (e. g. xpath) an apath expression is a (.
-separated) list of steps. The evaluation of a step has a node (of the underlying structure) as input (called context node) and yields a solution, a sequence of nodes. Every node of the solution is passed to the subsequent step where it is the new context node. The result of the overall evaluation is the solution of the last step.
!!! A semi-formal semantics will be stated here soon in tabular form to help understandig expression evaluation (especially passing the context node) for people that are not familiar with xpath semantics.
keyword/symbol | description | apath example | result |
---|---|---|---|
______________________________________________________ | |||
access | |||
. |
property selection (selectors) | pers.name |
"Miller" |
`...` |
... with non-identifier chars | pers.`#post-code` |
12205 |
regex('...') |
... with regex | pers.regex('.*a.*') |
"Miller", 17 |
[i] |
subscript | favs[1] |
"dylan" |
* |
array items | favs.* |
"coen-brothers", "dylan" |
properties | pers.* |
true, 12205, "Miller", 17 | |
reflective | |||
& (or selector ) |
property names | pers.*.& |
"driver Lic", "#post-code", "name", "age" |
_ (or self ) |
current node | pers.name._ |
"Miller" |
merge | |||
** |
all descendants including current node | _.** |
{"pers":{"name":"Miller", ...}, ... }, ..., "dylan" |
... in bottom up order (**^ ) |
_.**^ |
"Miller", 17, 12205, ... | |
union |
union of selections | union( pers.*, favs.* ) |
true, 12205, "Miller", 17, "coen-brothers", "dylan" |
comparison | |||
eq |
equality | pers.name.eq('Miller') |
true |
lt , gt , le , ge
|
inequality (less then, ...) | pers.age.lt(17) |
false |
boolean operator | |||
and |
conjunction | pers.age.and( true, ge(16) ) |
true |
or |
disjunction | pers.age.or( gt(18), le(16) ) |
false |
xor |
exclusive disjunction (xor) | pers.xor( age.lt(18), driverLic ) |
false |
not |
negation | pers.age.not( le(18) ) |
false |
control | |||
? |
filter operator | pers?( age.eq(17) ).name |
"Miller" |
cond |
conditional | pers.cond( driverLic, age, name ) |
17 |
opt |
optional (e.g. weight property not required [4])
|
pers?( opt(weight).lt(90) ).age |
17 |
keyword/symbol | description | example | result |
---|---|---|---|
______________________________________________________ | |||
pattern | |||
match |
regex match on values | favs.*?( match('.*dy.*') ) |
"dylan" |
regex (first) group selection on values | pers.name.match('M(.*)er').eq('ill') |
true | |
access | |||
[#i] |
sequence subscript (node order i within the parent [5])
|
favs.*[#1] |
"dylan" |
[#i..j] |
slice | favs.*[#0..1] |
"coen-brothers", "dylan" |
[#i..] |
open slice | favs.*[#1..] |
"dylan" |
boolean operator | |||
type |
type check (primitive types) | pers.age.type(Number) |
true |
imply |
implication | pers.imply( age.lt(18), not(driverLic) ) |
false |
every |
universal quantification | favs.every( *, type(String) ) |
true |
some |
existential quantification | favs.some( *, eq('dylan') ) |
true |
variables | |||
<step>$ <id> |
variable binding ($_ stands for the selector [6] as the variable name)
|
pers.and( name $_, age $a ) |
vars:[{"name":"Miller"},{"a":17}] |
$ <id> |
variable access (...eq($x)...) | pers.name$x.eq($x) |
true |
variable access 1 ($ w/o id stands for the root of input structure)
|
pers.union(name, $.favs[1]) |
"Miller", "dylan" | |
control | |||
{...} |
sub-expression ([12], cf. struct creation below) | _{ pers.name $_ }.favs[1] |
"dylan" vars:[{"name":"Miller"}] |
re-use | |||
def |
user-defined (parametric) expression [7], [8] with parameter numbers | def(either, and(or(#0, #1), not(and(#0, #1))) ) |
|
with parameter names |
def(either(first, second), and(or($first, $second), not(and($first, $second))) )
|
||
expression application |
def(either,...) .pers.either( age.lt(18), driverLic )
|
false | |
construction | |||
new |
creation of a new node | new $x |
|
: |
assignment (<lhs> : <rhs> )
|
new $age : 99 |
99 vars:[{"age": 99}] |
{...} |
struct creation ([9], [12]) | new $x : {age: 99} |
{"age": 99} vars: [{"x": {"age": 99}}] |
[...] |
array creation (...) | {num: [99, 100]} |
{"num": [99, 100]} |
directives & functions | |||
:: <directive>((...) )? |
directive | pers{::complete} |
|
j:: <namespace>:: <func>( ...)
|
predefined external java method | pers.name.eq( j::str::conc('Mi', 'ller') ) |
true |
def-script |
predefined javascript function (first parameter 'self' is the context node) |
def-script(""" function canDrive(self, tag) { return (self < 18 ? 'avoid ' : 'do ') + tag } """). pers.age.js::canDrive('it')
|
"avoid it" |
Selector names corresponds to properties (in json) or element names (in xml/html). If an expression contains a selector that is not contained in the input object, an empty solution is retrieved, which allows for optional branches in expressions. But sometimes typos are contained which yield unintended empty solutions. For test purposes the salient mode can be enabled. For instance, if, in salient mode, the input is
{"x":{"y":1}}
and the expression is x.yy
then the error
salience: selectors [yy] used but not found (available selectors: [x,y])
will be returned.
Remark: Currently, salient mode does not check xml attributes.
Constraints are boolean expressions that have to be evaluated to true. For instance, the expression
pers.assert(
imply( age.lt(18), not(driverLic) ),
`#post-code`.and( ge(13000), le(99999) )
)
evaluates to false
and yields the following violation message in constraints mode [10] (for constraints mode see help in section Service below: curl --location --request POST http://localhost:8082/apath/help
):
// Constraints violated. Possible corrections ('← ...') below
{
"pers": { // ← `#post-code`.assert(ge(13000),le(99999))
// ← imply(age.lt(18),not(driverLic))
"name": "Miller",
"age": 17,
"#post-code": 12205, // ← ge(13000)
"driverLic": true
},
"favs": [
"coen-brothers",
"dylan"
]
}
To state user-friendly, customized violation messages, one have to use assertions with extra textual messages. Those must have the form
assert(message(<text>), <expr1>, ... , <exprN>)
where <text>
is the custom text and <expr1>, ... , <exprN>
a list of constraints. Whenever one of the constraints is violated, the custom message is displayed. E.g., the constraint
pers.and(
assert( message('aged <18 cannot have a driver lic'),
imply( age.lt(18), not(driverLic) )),
`#post-code`.assert( message('must be >=13000 and <=99999'),
ge(13000), le(99999))
)
that is semantically equivalent to the one above, yields the violation
// Constraints violated. Possible corrections ('← ...') below
{
"pers": { // ← aged <18 cannot have a driver lic
"name": "Miller",
"age": 17,
"#post-code": 12205, // ← must be >=13000 and <=99999
"driverLic": true
},
...
}
From the theoretical viewpoint schemata are specialized constraints. The automatically generated schema for the running example is
assert(
pers.assert(
driverLic.type(Boolean),
`#post-code`.type(Number),
name.type(String),
age.type(Number)),
favs.every(*, type(String)))
Here, by default, all properties are stated as mandatory. If optionality is required the opt
keyword has to be used, e. g. opt(driverLic).type(Boolean)
(see keyword table above). Schema generation takes structure variance within arrays into account. Concretely, in case of "favs": ["coen-brothers", 42]
the generated constraint would be
favs.every(*, or( type(String), type(Number))))
Schema modularization can be done via the def
-keyword and helps to reuse sub schemata comparable to Object Oriented Modeling techniques (e. g. inheritance could be mimicked thereby). For instance, the following shows the extra handling of a Person-sub-schema:
def(PersonSchema,
assert(
driverLic.type(Boolean),
... <see above> )).
assert(
pers.PersonSchema(),
favs.every(*, type(String)))
One essential advantage to have one language, besides the fact that learning effort is minimized, is mixing pure schema and value constraints:
def(personCheck, assert(
imply( age.lt(#0), not(driverLic) ),
`#post-code`.assert( ge(13000), le(99999) )
)).
assert(
pers.and(
personCheck(18),
PersonSchema()),
favs.every(*, type(String)))
Moreover, mixing enables that pure schema constructs, e. g. type(..)
, could be enforced in dependence of input values. For instance, the following schema requires the presence of the license label of type string if the person has a driver license
pers.imply(driverLic, assert(license.type(String)))
For our running example the message
{
pers: { ← license.type(String)
driverLic: true,
...
will be stated.
Completeness of an input structure means that properties not contained in the schema must not occur in the input structure. It could be possible with the apath keywords so far to enforce completeness, but it would be less readable an too verbose. Therefore the directive ::complete
is provided:
...
and(
pers{::complete}.and(
... ),
... )
e. g., if the pers
property of the running example would have the property weight
then the violation
pers: { ← selectors [weight] not covered by schema
would occur.
Completeness can also be checked globally because this directive is handled as an ordinary step. E. g., to have the completeness constraint for the whole input one have to state **.::complete
. Again, it could also depends on values of the input instance.
The expression in Appendix (a) covers all constraints introduced so far.
The basis for transformations are features for constructing new structures. Construction is a combination of creation, struct-expressions ({...}
[12]), and assignments. Creation is a step with keyword new
(usually followed by a variable binding) and assignments are paths suffixed with : expr
. In order to enabling reuse of json structures we adopt the json syntax - in fact, json syntax is a subset of apath syntax. The following example binds a new structure to variable x.
new $x : {
"pers": {
"name": "smith, john",
"age": 22
},
"favs": ["dylan", "dePalma", "tarantino"]
}
The flexibility of apath can be used when constructing, e. g. the following expression is equivalent to the above one.
new $x : {
pers.name: 'smith, john', pers.age: 22,
favs: ['dylan', 'dePalma'], favs[2]: 'tarantino'
}
Often used in database context, when flat structures are preferred, the following example builds the flat name property with the running example as input
_{new $name} // var for the name
// the following structure is returned
.{
// the property name is build with the selector path
asProperty( $name.selectorPath() )
:
// name is bound
pers.name $name
}
and the result {"pers.name": "Miller"}
. To generally flatten a structure, the expression
_{new $flat} // var for the result
._{
// iterate all leafs
**?(isLeaf()) $leaf
// bind the selector path (*)
.selectorPath() $path
// extent $flat with the flat property name
._{$flat.asProperty($path): $leaf}
}
// return the flat structure
.$flat
can be used and has the result
{
"pers.name": "Miller",
"pers.age": 17,
"pers.#post-code": 12205,
"pers.driverLic": true,
"favs.0": "coen-brothers",
"favs.1": "dylan"
}
To restrict the flattening to properties name and age, one can use the expression .selectorPath()?(match('.*\.age|.*\.name')) $path
instead of part (*) above.
Essentially, transformation is construction using values selected from input nodes. Let us take the structure assigned to $x
above as input. Known from usual mapping scenarios, a 1:n and n:1 mapping should be supported. (Output-driven) transformations, as known from xslt, define the new structure and assigns values of the input node. The following expression
new $y : {
"surname": pers.name.match('(.*),.*'),
"firstname": pers.name.match('.*,\s*(.*)'),
"userName": j::str::conc($.favs[0], pers.age.text())
}
defines surname and first name by accessing the name property of the input (1:n mapping). The fourth line defines the user name by concatenating the top favorite and the age (n:1 mapping). It yields the output
{
"firstname": "john",
"surname": "smith",
"userName": "dylan22"
}
Because sequences naturally correspond to arrays in json context, one can easily transform them, e. g. by means of *
. This is also called list comprehension in functional languages. For instance, in Haskell and Python, one would write [v | v <- ...]
and [v for v in ...]
, respectively. The following example (again with input structure $x
)
new $y : {
"favorites": [
favs.*. { "top": &, "fav" : _ }
]
}
iterates over favs
and constructs the struct items with the input array item itself (_
) and its selector (&
).
Result:
{"favorites": [
{
"top": "0",
"fav": "dylan"
},
{
"top": "1",
"fav": "dePalma"
},
{
"top": "2",
"fav": "tarantino"
}
]}
The expression in Appendix (b) shows the whole transformation making use of modularization, comparable to templates in xslt.
Orienting towards the declarative, functional character of languages, especially xpath, modification of input structures as a side effect is not prioritized. One should prefer creating new structures. But in some cases, e.g. for efficiency reasons, modification can be performed also. To avoid unintended modification the directive ::modifiable
must be used to explicitly mark the input structure root as modifiable [13].
Modifications are assignment expressions over the input structure where the properties and subscripts (selectors) within the lhs (left hand side) of the assignment are created on demand if they do not exist. The rhs (right hand side) can be arbitrary expressions. Let the input structure be the running example.
For instance, only modification (no extension) is done with the following sub-expression [12] (note that _
is a syntactic short form for self
, in this case the input structure itself)
_{::modifiable, pers.age : 18, favs[0]: "tarantino"}
which redefines the age and the top favorite. The expression
_{::modifiable,
pers.info :
pers.cond(and(age.lt(18), driverLic),
"failure: driver lic not allowed",
"success: driver lic allowed")
}
extends the pers
sub-structure of the input structure by the info
property whose value depends on other properties (cf. section constraints), with result { ... "info": "failure: driver lic not allowed" ... }
Of coarse, assignments could be embedded within complex expressions. E.g.,
_{::modifiable,
pers.
cond(and(age.lt(18), driverLic),
failure: "driver lic not allowed",
success: "driver lic allowed")
}
The effect is nearly the same as above except that now there are two different properties (note that the context node of the cond
-arguments is pers
). Result: { ... "failure": "driver lic not allowed" ... }
Download https://github.com/a-f-m/japath3/raw/main/japath3-playground.zip
and unzip to directory (called D in the following). Go to directory D.
java -Dfile.encoding=UTF-8 -cp target/japath3-playground.jar service.PlayMainApath
Remark: for now it only runs on port 8085. To start the UI execute http://localhost:8085/home/index-local.html
in your browser.
For testing purposes, the console output of the playground can be used as request input for the service below.
For download see playground above. Go to directory D.
Besides the usage in java (programmatic), it is shipped with a command line interface. For instance, the command
echo {"x":{"y":1}} | java -Dfile.encoding=UTF-8 -cp target/japath3-playground.jar japath3.cli.Commands --op select --stdin -apathExpr "x.y"
returns the result 1
. Execute java -Dfile.encoding=UTF-8 -cp target/japath3-playground.jar japath3.cli.Commands --help
for help. Cli is only recommended for test purposes due to the time-consumptive start of the java VM and load of the graal engine. Because of caching it is recommended to use the Service below.
For download see playground above. Go to directory D.
A simple http service (which is integrated in the playground for now) can be started with
java -Dfile.encoding=UTF-8 -cp target/japath3-playground.jar japath3.cli.Commands --service <port>
to allow for be called from other services. <port>
must not be 8085
if the playgound above runs.
Then, for help, execute curl --location --request POST http://localhost:8082/apath/help
. The service follows the functionality of the Cli. For instance, the above selection (Cli) with result 1
can be equivalently performed with
curl --location --request POST 'http://localhost:8082/apath/eval' \
--header 'Content-Type: application/json' \
--data-raw '{
"_op": "select",
"type": "json",
"_body": {"x":{"y":1}},
"apathExpr": "x.y",
"salient": true
}
'
Remark: if "type":"xml"
is set then the value of property _body
have to be a string containing the xml/html input. For now xml
is in experimental state!
to be cont.
All examples above can be tried out in the playground. To use it in a project, one way is to call the service (see above) via http. But if your host language is java, it is preferred to use apath in a programmatic way.
Use the jar-file japath3-playground.jar
(see section playground)
In the following we reuse the examples introduced so far embedded in java code. Again, we use our running example.
Let
{
"pers": {
"name": "Miller",
"age": 17,
"#post-code": 12205,
"driverLic": true
},
"favs": ["coen-brothers", "dylan"]
}
be the json string assigned to a java string variable jo
. Wrapping up jo
is done with
Node node = WJsonOrg.w_(jo);
Here we use the wrapper for org.json.*
.
Selecting a single node resp. its value is done with
PathExpr expr = Language.e_("pers.name");
Node u = Japath.select(node, expr);
String v = u.val();
System.out.println(u);
System.out.println(v);
where node
is the wrapped up json above. The first line builds an internal apath expression. The second line evaluates this expression over node
and retrieves a single node whose string value is accessed in the third line. The output will be
`name`->Miller
Miller
To be more non-verbose the above could be written as
String v = select( w_(jo), e_("pers.name") ).val();
making use of java import static ...
for the corresponding methods. We selected a single primitive value so far. Retrieving a json (sub-) object is performed with
Node person = select(w_(jo), e_("pers"));
Node name = select(person, e_("name"));
System.out.println(name.val().toString());
JSONObject joPerson = person.val();
System.out.println(joPerson);
yielding output
Miller
{"name":"Miller","age":17,"#post-code":12205,"driverLic":true}
The first line in the java snippet selects the non-primitive person
node which in turn is used as the input node for the name
selection in the second line. The fourth line retrieves the underlying de-wrapped json object. As we see in this example every node contains the wrapped json object.
Rem.: The local wrapping-up during evaluation is performed on demand to save execution time.
The select
method is only used for single selection. To retrieve multiple solutions the method walki
has to be used.
Iterable<Node> nodes = Japath.walki( w_(jo), e_("union( pers.*, favs.* )") );
for (Node x : nodes) System.out.println(x.val().toString());
that yields output
Miller
17
12205
true
coen-brothers
dylan
apath offers the stream variant walks
for making use of java stream methods. E.g., the snippet
walks( w_(jo), e_("union( pers.*, favs.* )") ) //
.filter(x -> {
return x.val().toString().matches(".*(ill|bro).*");
})
.forEach(x -> System.out.println(x.val().toString()));
yields output
Miller
coen-brothers
Of coarse, this example can be realized with apath expressions itself, e.g.
walki( w_(jo), e_( "union( pers.*, favs.* ) ? (match('.*(ill|bro).*') ) ") ) //
.forEach(x -> System.out.println(x.val().toString()));
yields the same output.
Rem.: For more complex multi-line apath expressions it is recommended to use a separate file. Then the method Language.e_(s)
will be called after reading the file content to s
. Another way in java16 is to use multi-line strings with """
. To handle expressions in a clean manner so-called modules are introduced below.
<tbd>
Input:
{
"pers": {
"name": "Miller",
"age": 17,
"#post-code": 12205,
"driverLic": true,
"weight": 90
},
"favs": [
"coen-brothers",
"dylan"
]
}
apath expression:
def(PersonSchema,
assert(
driverLic.type(Boolean),
`#post-code`.type(Number),
name.type(String),
age.type(Number) )).
def(personCheck, assert(
imply( age.lt(#0), not(driverLic) ),
`#post-code`.assert( ge(13000), le(99999) )
)).
assert(
pers{::complete}.assert(
personCheck(18),
PersonSchema()),
favs.every(*, type(String))
)
Result:
// Constraints violated. Possible corrections ('← ...') below
{ // ← pers {::complete()}.assert(personCheck(18),PersonSchema())
"pers": { // ← selectors [weight] not covered by schema
// ← personCheck(18)
// ← `#post-code`.assert(ge(13000),le(99999))
// ← imply(age.lt(#0),not(driverLic))
"name": "Miller",
"age": 17,
"#post-code": 12205, // ← ge(13000)
"driverLic": true,
"weight": 12
},
"favs": [
"coen-brothers",
"dylan"
]
}
Input:
{
"pers": {
"name": "smith, john",
"age": 22
},
"favs": ["dylan", "dePalma", "tarantino"]
}
apath expression:
def(personal,
{
"surname": #0.match('(.*),.*'),
"firstname": #0.match('.*,\s*(.*)'),
"userName": j::str::conc(#1, pers.age.text())
}
).
def(favorites,
[
#0.*. {"top": &, "fav" : _}
]
).
{
"personal": personal($.pers.name, $.favs[0]),
"favorites": favorites($.favs)
}
Result:
{
"favorites": [
{
"top": "0",
"fav": "dylan"
},
{
"top": "1",
"fav": "dePalma"
},
{
"top": "2",
"fav": "tarantino"
}
],
"personal": {
"firstname": "john",
"surname": "smith",
"userName": "dylan22"
}
}
Using it as a service (see corresponding section) can be seen as a fully programmable unit providing all functionality used for manipulating hierarchical structures like json.
For now basic parser functionality is supported. For instance, the expression a.type('hi')
yields the error message "user-defined expression 'type' not found" because the string parameter 'hi'
does not match a predefined type keyword and the parser proceeds to the grammar rule for user defined-expressions. Future versions will provide exhausted analyses & validation.
Attention! if you want to match
If selection does not succeed, steps behind are not evaluated. In context of boolean evaluation it yields true.
Attention! does not denote the evaluation order, e. g. after filter-apply; mostly used for xml/html where no first-class array exists, or in json context directly after *
.
Attention in case of non-property selectors of the context node.
Note that parameters within the definition are substituted by the parameter expressions of the call before evaluation. Parameter numbers (#i
) are recommended for short expressions if the meaning of the parameter is obvious. Both ways are justified.
If expressions change often, e.g. by introducing another parameter, renumbering is error prone and named parameters should be used. Internally, def's with named parameter lists def( f(... pi ...), ... $pi ... )
are transformed to def( f, ... #i ... )
.
If the evaluation of the rhs
expression has no result then nothing is constructed. E.g., if the input is {a: 1}
and the construction is {x: a, y: b}
then the result is {x: 1}
.
In constraint mode one can use keyword assert
instead of and
. Semantics are equivalent, apart from annotation handling for violation text output.
For now construction/transformation is not supported for xml/html.
Attention: {...} is overloaded with sub-expressions (grammatically comparable with overloading of {...}
in java for array initialization as well as blocks). A sub-expressions follows a step immediately whereas a struct is itself a step (see grammar).
It has to be done carefully, because of the side effects the results depend on evaluation order. Semantically, a modifiable node is equivalent to a created one (see section construction).