Parser - uhop/stream-json GitHub Wiki
The core of the package. parser() is a factory that returns a function for use in a chain() pipeline. It consumes text and produces a stream of {name, value} tokens. It is always the first component in a pipeline, fed with text from a file, socket, or any other source.
parser.asStream() wraps the parser as a Duplex stream: the writable side accepts text (Buffer/string), the readable side emits token objects in object mode.
Parser assumes well-formed input. For error diagnostics use Verifier.
For JSONL where individual items fit in memory, jsonl/Parser is faster.
Introduction
Using parser() with chain():
const {chain} = require('stream-chain');
const {parser} = require('stream-json/parser.js');
const fs = require('fs');
const pipeline = chain([fs.createReadStream('sample.json'), parser()]);
let objectCounter = 0;
pipeline.on('data', data => data.name === 'startObject' && ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));
Using parser.asStream() with .pipe():
const {parser} = require('stream-json/parser.js');
const fs = require('fs');
const pipeline = fs.createReadStream('sample.json').pipe(parser.asStream());
let objectCounter = 0;
pipeline.on('data', data => data.name === 'startObject' && ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));
API
The module exports a parser factory function. It produces a rigid token stream whose order is strictly defined — it is impossible to get a token out of sequence. All data items (strings, numbers, even object keys) are streamed in chunks and can be of any size.
When individual data items fit in memory, the parser can pack them into single tokens for easier inspection.
parser(options)
options is an optional object. The following custom flags are recognized (all truthy/falsy):
jsonStreamingcontrols the parsing algorithm. If truthy, a stream of JSON objects is parsed as described in JSON Streaming as "Concatenated JSON". Technically it will recognize "Line delimited JSON" AKA "JSON Lines" AKA JSONL as well. Otherwise, it will follow the JSON standard assuming a singular value. The default:false.- It allows streaming any number of values one after another.
- It handles empty streams producing no values.
- (Since 1.6.0) If you deal with JSONL, you may want to use jsonl/Parser to improve the performance.
- It allows streaming any number of values one after another.
- Packing options control packing values. They have no default values.
packValuesserves as the initial value for packing strings, numbers, and keys.packKeysspecifies, if we need to pack keys and send them as a value.packStringsspecifies, if we need to pack strings and send them as a value.packNumbersspecifies, if we need to pack numbers and send them as a value.- More details in the section below.
- Streaming options control sending unpacked values. They have no default values.
streamValuesserves as the initial value for other three options described above.streamKeysspecifies, if we need to send items related to unpacked keys.streamStringsspecifies, if we need to send items related to unpacked strings.streamNumbersspecifies, if we need to send items related to unpacked numbers.- More details in the section below.
By default, Parser follows a strict JSON format, streams all values by chunks and individual (packed) values.
Stream of tokens
This is the list of data objects produced by Parser in the correct order:
// a sequence can have 0 or more items
// a value is one of: object, array, string, number, null, true, false
// a parser produces a sequence of values
// object
{name: 'startObject'};
// sequence of object properties: key, then value
{name: 'endObject'};
// array
{name: 'startArray'};
// sequence of values
{name: 'endArray'};
// key
{name: 'startKey'};
// sequence of string chunks:
{name: 'stringChunk', value: 'string value chunk'};
{name: 'endKey'};
// when packing:
{name: 'keyValue', value: 'key value'};
// string
{name: 'startString'};
// sequence of string chunks:
{name: 'stringChunk', value: 'string value chunk'};
{name: 'endString'};
// when packing:
{name: 'stringValue', value: 'string value'};
// number
{name: 'startNumber'};
// sequence of number chunks (as strings):
{name: 'numberChunk', value: 'string value chunk'};
{name: 'endNumber'};
// when packing:
{name: 'numberValue', value: 'string value'};
// null, true, false
{name: 'nullValue', value: null};
{name: 'trueValue', value: true};
{name: 'falseValue', value: false};
All value chunks (stringChunk and numberChunk) should be concatenated in order to produce a final value. Empty string values may have no chunks. String chunks may have empty values.
Important: values of numberChunk and numberValue are strings, not numbers. It is up to a downstream code to convert it to a number using parseInt(x), parseFloat(x) or simply x => +x.
All items follow in the correct order. If something is going wrong, a parser will produce an error event. For example:
- All
startXXXare balanced withendXXX. - Between
startKeyandendKeycan be zero or morestringChunkitems. No other items can be seen. - After
startObjectoptional key-value pairs emitted in a strict pattern: a key-related item, then a value, and this cycle can be continued until all key-value pairs are streamed.- It is not possible for a key to be missing a value.
- All
endObjectare balanced with the correspondingstartObject. endObjectcannot closestartArray.- Between
startStringandendStringcan go 0 or morestringChunk, but no other items. endKeycan be optionally followed bykeyValue, then a new value will be started, but noendObject.
In short, the item sequence is always correctly formed. No need to do unnecessary checks.
Packing options
Parser packs keys, strings, and numbers separately. A frequent case when it is known that key and number values can fit in memory, but strings cannot.
Internally each type of value is controlled by a flag:
- By default, this flag is
true. - If
packValuesis set, it is assigned to each flag. - If an individual option is set, it is assigned to the flag.
Examples:
| Supplied options | packKeys |
packStrings |
packNumbers |
|---|---|---|---|
{} |
true |
true |
true |
{packValues: false} |
false |
false |
false |
{packValues: false, packKeys: true} |
true |
false |
false |
{packKeys: true, packValues: false} |
true |
false |
false |
{packStrings: false} |
true |
false |
true |
{packKeys: true, packStrings: false, packNumbers: true} |
true |
false |
true |
Streaming options
Parser can optionally skip streaming keys, strings, and/or numbers for optimization purposes, if a corresponding packing option is enabled. It means that only three configurations are supported for values and keys:
- The default:
startXXX, 0 or morestringChunk(numberChunkfor numbers),endXXX,xxxValue. packXXXisfalse:startXXX, 0 or morestringChunk(numberChunkfor numbers),endXXX.packXXXistrue,streamXXXisfalse:xxxValue.
Internally each type of value is controlled by a flag:
- By default, this flag is
true. - If
streamValuesis set, it is assigned to each flag. - If an individual option is set, it is assigned to the flag.
- If a corresponding packing option is
false, it is set totrue.
Examples:
| Supplied options | streamKeys |
streamStrings |
streamNumbers |
|---|---|---|---|
{} |
true |
true |
true |
{packValues: true, streamValues: false} |
false |
false |
false |
{packKeys: true, streamKeys: false} |
false |
true |
true |
{packKeys: false, streamKeys: false} |
true |
true |
true |
{packValues: true, streamValues: false, streamKeys: true} |
true |
false |
false |
{streamStrings: false} |
true |
true |
true |
{packKeys: true, streamKeys: false, streamStrings: false, streamNumbers: true} |
false |
true |
true |
Static methods and properties
parser.asStream(options)
Wraps the parser as a Duplex stream. Useful with .pipe() when not using chain():
const {parser} = require('stream-json/parser.js');
const fs = require('fs');
const pipeline = fs.createReadStream('sample.json').pipe(parser.asStream());
let objectCounter = 0;
pipeline.on('data', data => data.name === 'startObject' && ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));