jsonl Parser - uhop/stream-json GitHub Wiki
(Since 1.6.0) This is a convenience component for parsing large JSONL (AKA NDJSON) files. It consumes text and produces a stream of JavaScript objects. It is always the first in a pipe chain being directly fed with text from a file, a socket, the standard input, or any other text stream.
Tip: If you don't need
errorIndicatororcheckErrors, consider usingstream-chain/jsonl/parserdirectly — it produces the same{key, value}output with a simpler API (reviverandignoreErrorsoptions).
Functionally, jsonl/Parser replaces a combination of Parser with jsonStreaming set to true, which immediately follows by StreamValues. The only reason for its existence is improved performance.
Just like StreamValues it produces a stream of objects like that:
StreamValues assumes that a token stream represents subsequent values and streams them out one by one.
// From JSONL:
// 1
// "a"
// []
// {}
// true
// It produces:
{key: 0, value: 1}
{key: 1, value: 'a'}
{key: 2, value: []}
{key: 3, value: {}}
{key: 4, value: true}
Introduction
The simple example (streaming from a file):
const jsonlParser = require('stream-json/jsonl/parser.js');
const fs = require('fs');
const pipeline = fs.createReadStream('sample.jsonl').pipe(jsonlParser.asStream());
let objectCounter = 0;
pipeline.on('data', () => ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));
The alternative example:
const jsonlParser = require('stream-json/jsonl/parser.js');
const fs = require('fs');
const pipeline = fs.createReadStream('sample.jsonl').pipe(jsonlParser.asStream());
let objectCounter = 0;
pipeline.on('data', data => ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));
Functionally equivalent to:
const {parser} = require('stream-json/parser.js');
const {streamValues} = require('stream-json/streamers/stream-values.js');
const {chain} = require('stream-chain');
const fs = require('fs');
const pipeline = chain([fs.createReadStream('sample.jsonl'), parser({jsonStreaming: true}), streamValues()]);
let objectCounter = 0;
pipeline.on('data', () => ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));
API
The module returns a factory function. jsonlParser() returns a composable function for use in chain(). jsonlParser.asStream() wraps it as a Duplex stream.
In many real cases, while files are huge, individual data items can fit in memory. It is better to work with them as a whole, so they can be inspected. jsonl/Parser leverages JSONL format and returns a stream of JavaScript objects exactly like StreamValues.
constructor(options)
options is an optional object described in detail in node.js' Stream documentation. Additionally, the following custom flags are recognized:
-
reviveris an optional function, which takes two arguments and returns a value.- See JSON.parse() for more details.
-
(Since 1.7.2)
checkErrorsis an optional boolean value. If it is truthy, every call toJSON.parse()is checked for an exception, which is passed to a callback. Otherwise,JSON.parse()errors are ignored for performance reasons. Default:false. -
(Since 1.8.0)
errorIndicatoris an optional value. If it is specified it supersedescheckError. When it is present, every call toJSON.parse()is checked for an exception and processed like that:- If
errorIndicatorisundefinedthe error is completely suppressed. No value is produced and the globalkeyis not advanced. - If
errorIndicatoris a function, it is called with an error object. Its result is used this way:- If it is
undefined⇒ skip as above. - Any other value is returned as a
value.
- If it is
- Any other value of
errorIndicatoris returned as avalue.
Default: none.
- If
Static methods and properties
jsonlParser.parser(options)
Alias of the factory function.
jsonlParser.asStream(options)
Returns a Duplex stream suitable for .pipe() usage:
const {chain} = require('stream-chain');
const jsonlParser = require('stream-json/jsonl/parser.js');
const fs = require('fs');
const pipeline = chain([fs.createReadStream('sample.jsonl'), jsonlParser.asStream()]);
let objectCounter = 0;
pipeline.on('data', () => ++objectCounter);
pipeline.on('end', () => console.log(`Found ${objectCounter} objects.`));