Functions:documentParser - bettyblocks/cli GitHub Wiki

Document Parser

Description

The documentParser function is an asynchronous function that takes in DocumentParserArguments as a parameter and returns a Promise that resolves to DocumentParserResult. It is used to parse a document and extract the text content from it.

Parameters

interface DocumentParserArguments {
  document: string; // document URL
  parserOptions?: {
    forceImage?: boolean;
    density?: number;
  };
}

interface DocumentParserResult {
  result: string;
}

Parser options

The parserOptions have two optional options. Density specifies the image resolution. The higher the density, the better the quality of the output will be. However, higher density also means slower processing. Force image forces the document to be scanned as an image. Sometimes this can result in a better output.

Returns

A Promise that resolves to DocumentParserResult which is an object that contains the result which is the extracted text from the parsed document.

Example

const isFileProperty = (value) =>
  value && typeof value === 'object' && 'url' in value;

const parseDocument = async ({ document, density, forceImage }) => {
  const url = isFileProperty(document) ? document?.url : document;

  const { result } = await documentParser({
    document: url,
    parseOptions: { density, forceImage },
  });

  return {
    result,
  };
};

export default parseDocument;