Processors module - AdrianC2000/InvoiceScannerApp GitHub Wiki
The application consists of 3 main processors:
- InvoiceInfoProcessor- general invoice processing
- TableDataProcessor- table processing
- KeyDataProcessor- key data processing
InvoiceInfoProcessor
Facade for single invoice processing - serves for:
- Prepares static configuration available in the whole project
- Invoice preparation - using InvoiceStraightener which fixes the angle skew of the scanned invoice file
- Calling TableDataProcessor
- Calling KeyDataProcessor
And then handling the output of previously called methods (i.e. exceptions)
TableDataProcessor
Facade for the table processing - functionalities:
- Table extraction using TableExtractor
- Cells and columns separation using ColumnsSeperator
- Text in table positions calculation using TextReader
- Read text alignment to the cells using CellsCreator
- Aligned cells content merge into phrases using WordsConverter
- Headers classification using HeadersClassifier
- Table removal from the invoice using TableRemover
Additionally, TableDataProcessor does some basic validations - such as minimal columns number check (hardcoded to 4 - an invoice which has less than 4 columns will be classified as faulty) and also minimal calculated headers confidence (an invoice with less than 0.3 for all the headers will be classified as faulty).
Returning ParsedTable object - the list of TableProduct objects (each row of the invoice table) and invoice ndarray with removed table.
KeyDataProcessor
Facade for the key data processing - functionalities:
- Blocks extracting using BlocksExtractor
- Blocks containing key words extraction using BlockClassifier
- Separating blocks with buyer and seller key words from the rest (required due to the different processing)
- Preliminary key data extraction (preliminary extraction means finding the key values that are in the same blocks as corresponding key words):
- Extracting preliminary person values using PersonValuesExtractor
- Extracting preliminary rest key values using KeyValuesExtractor
- Final key data extraction (only in cases where preliminary key data extraction did not deliver data for every key word)
- Extracting final person values using PersonValuesExtractor
- Extracting final rest key values using KeyValuesExtractor
- Merging preliminary and final results
Returns KeyData object - object containing every found key value data:
- seller and buyer data:
- name
- address
- nip
- invoice number
- currency
- listing date