Home - Noba1anc3/MFCNN GitHub Wiki

Experiments on Information Extraction in OCR

Based on Rules

A set of general rules for Information Extraction out of forms is designed and implemented independently.
The accuracy rate of key-value pairs on more than 100 customs declaration invoices reaches 98%.

Based on Neural Network

MFCNN based on BERT is reproduced. A series of experiments are carried out in terms of training set requirements,
training cost, model generalization, finetune for downstream tasks and parameter tuning.

Based on Machine Learning

The experiment of LightGBM Feature Engineering makes the F1 score of one shot learning exceed 0.9.