How to: Avoid Pitfalls - troyp/jq GitHub Wiki

TOC

Keywords
nan, NaN, inf, Inf, infinite and null
foo.bar vs .foo.bar
Cartesian Products
Generator Expressions in Assignment Right-Hand Sides
Backtracking (empty) in Assignment RHS Expressions and Reductions
Multi-arity Functions and Comma/Semi-colon Confusability
index/1 is byte-oriented but match/1 is codepoint-oriented

Keywords

The fact that jq has keywords such as if and end has various implications, some of which may not be obvious. In particular:

keywords cannot be used in the abbreviated syntax for specifying key-value pairs, e.g. {foo} for {"foo": .foo}
keywords cannot be used to form $-variable names

The full list of reserved keywords is currently:

__loc__ and as break catch def elif else end foreach if import include label module or reduce then try

(The list of keywords for any particular version of jq can be derived from the lexer.l file, the “master” version of which is https://github.com/stedolan/jq/blob/master/src/lexer.l)

`nan`, `NaN`, `inf`, `Inf`, `infinite` and null

nan is a jq value representing IEEE NaN, but it prints as null.

NaN is recognized in JSON text and is also understood to represent IEEE NaN.

Use isnan to test whether a jq value is identical to IEEE NaN.

Here are some illustrative examples:

$ echo NaN | jq .
null

$ echo nan | jq .
parse error: Invalid literal at line 2, column 0

$ echo NaN | jq isnan
true

$ jq -n 'nan | isnan'
true

Similar comments apply to the jq value infinite, and the admissible values inf and Inf:

$ echo Inf | jq isinfinite
true

$ echo inf | jq isinfinite
true

$ jq -n 'infinite | isinfinite'
true

`foo.bar` vs `.foo.bar`

foo.bar is short for foo | .bar and means: call foo and then get the value at the "bar" key of the output(s) of foo.

.foo.bar is short for .foo | .bar and means: get the value at the "foo" key of . and then get the value at the "bar" key of that.

One character, big difference.

Cartesian Products

jq is geared to produce Cartesian products at the drop of a hat. For example, the expression (1,2) | (3,4) produces four results:

To see why:

$ jq -n '(1,2) as $i | (3,4) |  "\($i),\(.)"' 
"1,3"
"1,4"
"2,3"
"2,4"

Generator Expressions in Assignment Right-Hand Sides

Generator expressions in assignment RHS expressions are likely to surprise users. Compare (.a,.b) = (1,2) to (.a,.b) |= (.+1,.*2).

Backtracking (`empty`) in Assignment RHS Expressions and Reductions.

.a=empty and .a|=empty behave differently:

null | .a = empty     #=> the empty stream 
null | .a |= empty    #=> null

In reductions, care should be exercised when including empty in the body. For example, one might reasonably expect that:

reduce 1 as $x (2; empty)

would produce 2, but in fact it produces null in most versions of jq, including jq 1.5 and earlier, as well as the current “master” version as of 2018.

WARNING: Expressions of the form A | .[] |= E where A is an array and E can evaluate to empty should in general be avoided. Their behavior is inconsistent between versions of jq, and jq version 1.6 will often evaluate them incorrectly. For example, using jq 1.6:

jq -n '[0,1,2] | .[] |= if . == 0 then empty else . end'

yields:

[1,2,null]

Multi-arity Functions and Comma/Semi-colon Confusability

foo(a,b) is NOT the same as foo(a;b). If foo/1 and foo/2 are both defined, then if you write foo(a,b)intending to call the two-argument function, you'll silently get the wrong behavior.

For example, foo(1,2) is a call to foo/1 with a single argument consisting of the expression 1,2, while foo(1;2) is a call to foo/2 with two arguments: the expressions 1, and 2.

One character, big difference.

`index/1` is byte-oriented but `match/1` is codepoint-oriented

Given strings as input, the index family of filters (index, rindex, indices) return byte-oriented offsets. For codepoint-oriented offsets, either use the array-oriented versions of these filters, or use match/1 or match/2.

For example:

$ jq -cn '"aéb" | [., index("b")]'
["aéb",3]
$ jq -cn '"aéb" | [., (explode|index("b"|explode))]'
["aéb",2]
$ jq -cn '"a\u00e9b" | [., index("b")]'
["aéb",3]
$ jq -cn '"a\u00e9b" | match("b").offset'
2