X Experimental Benchmarks - troyp/jq GitHub Wiki

Experimental Benchmarks Page

This page describes some benchmarks and gives representative timings and "maxrss" (maximum resident set size) statistics.

Each "test" consists of a combination of a task, typically a jq program, and some input data (possibly null). The first test however involves the md5 program, first so that the md5 value of a particular JSON file can be shown, and second to give a reference point for comparison.

Each combination of task and input data is assigned a number, given in the form (N); for example, the first "test" is:

(1) md5 jeopardy.json

This page is organized as follows:

  • the SOURCES sections has one subsection each for DATA and for PROGRAMS;

  • the RESULTS section is organized into GROUPS so that the timings within each group are roughly comparable. Groups are identified by a string such as "Mac OS X (High Sierra) 3GHz 16GB RAM"

In the RESULTS section, the version of jq should be specified according to its tag, e.g. jq-1.5, jq-1.6rc1

Unless otherwise noted, the version of gojq used is 0.12.6.

SOURCES

SOURCES: DATA

"jeopardy.json" (aka JEOPARDY_QUESTIONS1.json) [54MB]

Description: https://www.reddit.com/r/datasets/comments/1uyd0t/200000_jeopardy_questions_in_a_json_file

"citylots.json" [181MB]

Description: https://github.com/zemirco/sf-city-lots-json

SOURCES: PROGRAMS

"schema.jq"

See [*1] for alternative ways to include this module.

"testzip.jq"

def zip(headers):
  . headers as $headers
  | [$headers, .] | transpose | map({(.[0]): .[1]}) | add ;

def testzip(n):
  [range(0;n)] as $row
  | $row | zip( $row|map(tostring) ) ;

testzip(1000000) | length

RESULTS

GROUP: "Mac OS X (High Sierra) 3GHz 16GB RAM"

(1) md5 jeopardy.json

MD5 (jeopardy.json) = 2075398fa049b1c00223b2279ca5281d
user	0m0.126s
sys	0m0.025s
maxrss  11341824

(2) length jeopardy.json

jq-1.5 length jeopardy.json
216930
user	0m1.144s
sys	0m0.112s
maxrss  223440896

(2 rq) length jeopardy.json

rq 'map(s)=>{s.length}' < jeopardy.json
216930
user	4.76s
sys	0.27s
maxrss  372486144 

(2 gojq) length jeopardy.json

216930
real         0.90
user         0.89
sys          0.12
maxrss 234708992

(2 dasel) length jeopardy.json

dasel --length -f jeopardy.json
216930
user         1.04
sys          0.17
maxrss  317427712 

(3) schema.jq jeopardy.json

jq-1.5 -L . --arg nullable true 'include "schema"; schema' jeopardy.json > jeopardy.schema.json
user 7.10s
sys  0.13s
maxrss 223383552

jq-1.6 -L . --arg nullable true 'include "schema"; schema' jeopardy.json > jeopardy.schema.json

user         8.94
sys          0.16
maxrss 223395840

gojq -L . --arg nullable true 'include "schema"; schema' jeopardy.json > jeopardy.schema.json

user        13.98
sys          0.57
maxrss 1193697280

(4) null testzip.jq

jq-1.5 -n testzip.jq
1000000
user 6.11s
sys  0.35s
maxrss 711286784

(5) . jeopardy.json

jq-1.5 . jeopardy.json | wc -l
1952372
user        4.69s
sys         0.12s
maxrss 223350784

(5 rq) . jeopardy.json

rq --format readable id < jeopardy.json | wc -l
1952372

user   21.38s
sys     2.13s
maxrss 381214720

(6) 'select(length==2)' jeopardy.json # --stream

jq-1.5 --stream 'select(length==2)' jeopardy.json | wc -l
10629570
user	0m8.901s
sys	0m0.087s
maxrss 1359872

(7) null 0

jq-1.5 -n 0
user   0.002924s
sys    0.001339s
maxrss 1187840

Times are based on 1000 iterations using a bash loop, after adjusting for the times of the looping itself.

jq-1.6rc1 -n 0
user:  0.030609s
sys :  0.001838s
maxrss 2076672

Times are based on 1000 iterations using a bash loop, after adjusting for the times of the looping itself.

(8) md5 citylots.json

md5 citylots.json
MD5 (citylots.json) = 158346af5a90253d8b4390bd671eb5c5
user 0.43s
sys  0.06s 
maxrss  11333632

(9) length citylots.json

jq-1.5 length citylots.json
2
user	0m6.887s
sys	0m0.772s
maxrss 1375858688

(10) '.features|length' citylots.json

jq-1.5 '.features|length' citylots.json
206560
user 6.23s
sys  0.78s 
maxrss 1375899648

(11) schema.jq citylots.json

jq-1.6 -L . --argjson nullable true 'include "schema"; schema' citylots.json > citylots.schema.json
user        58.47
sys          0.97
maxrxx 1376256000
maxrss 1375961088

(12) .features[10000].properties.LOT_NUM citylots.json

jq-1.5 '.features[10000].properties.LOT_NUM' citylots.json
"091"
user   6.44s
sys    0.97s
maxrss 1371561984
jq-1.6rc1 '.features[10000].properties.LOT_NUM' citylots.json
"091"
user   5.46
sys    0.73 
maxrss 1375936512
jq-1.5 -n --stream 'first(inputs | select(.[0] == ["features",10000,"properties","LOT_NUM"])) | .[1]' citylots.json
"091"
user   0.60s
sys    0.00s
maxrss 2084864 

APPENDIX 1: Output

"jeopardy.schema.json"

{
  "air_date": "string",
  "answer": "string",
  "category": "string",
  "question": "string",
  "round": "string",
  "show_number": "string",
  "value": "string"
}

"citylots.schema.json"

{
  "type": "string",
  "features": [
    {
      "geometry": {
        "coordinates": [
          [
            [
              "JSON"
            ]
          ]
        ],
        "type": "string"
      },
      "properties": {
        "BLKLOT": "string",
        "BLOCK_NUM": "string",
        "FROM_ST": "string",
        "LOT_NUM": "string",
        "MAPBLKLOT": "string",
        "ODD_EVEN": "string",
        "STREET": "string",
        "ST_TYPE": "string",
        "TO_ST": "string"
      },
      "type": "string"
    }
  ]
}

FOOTNOTES

[*1]

These examples use jq's -L option to specify that the module file schema.jq is in the present working directory (pwd). If the module file is not in the pwd, then one possibility would be to specify the directory using the -L option. An alternative would be to omit the -L option and to specify the directory in the include directive instead, as for example:

jq --arg nullable true 'include "schema" {search: "."}; schema' jeopardy.json

For further details about using include, see the jq documentation.

An alternative way to use schema.jq would be to uncomment the very last line (i.e., so it reads schema), and then invoke jq or gojq with the -f option, e.g. as follows:

jq --arg nullable true -f schema.jq INPUTFILE

where INPUTFILE is the input file.