Processors module - AdrianC2000/InvoiceScannerApp GitHub Wiki

The application consists of 3 main processors:

  1. InvoiceInfoProcessor- general invoice processing
  2. TableDataProcessor- table processing
  3. KeyDataProcessor- key data processing

InvoiceInfoProcessor

Facade for single invoice processing - serves for:

  • Prepares static configuration available in the whole project
  • Invoice preparation - using InvoiceStraightener which fixes the angle skew of the scanned invoice file
  • Calling TableDataProcessor
  • Calling KeyDataProcessor

And then handling the output of previously called methods (i.e. exceptions)

TableDataProcessor

Facade for the table processing - functionalities:

  • Table extraction using TableExtractor
  • Cells and columns separation using ColumnsSeperator
  • Text in table positions calculation using TextReader
  • Read text alignment to the cells using CellsCreator
  • Aligned cells content merge into phrases using WordsConverter
  • Headers classification using HeadersClassifier
  • Table removal from the invoice using TableRemover

Additionally, TableDataProcessor does some basic validations - such as minimal columns number check (hardcoded to 4 - an invoice which has less than 4 columns will be classified as faulty) and also minimal calculated headers confidence (an invoice with less than 0.3 for all the headers will be classified as faulty).

Returning ParsedTable object - the list of TableProduct objects (each row of the invoice table) and invoice ndarray with removed table.

KeyDataProcessor

Facade for the key data processing - functionalities:

  • Blocks extracting using BlocksExtractor
  • Blocks containing key words extraction using BlockClassifier
  • Separating blocks with buyer and seller key words from the rest (required due to the different processing)
  • Preliminary key data extraction (preliminary extraction means finding the key values that are in the same blocks as corresponding key words):
  • Final key data extraction (only in cases where preliminary key data extraction did not deliver data for every key word)
  • Merging preliminary and final results

Returns KeyData object - object containing every found key value data:

  • seller and buyer data:
    • name
    • address
    • nip
  • invoice number
  • currency
  • listing date

image