Home - hcts-hra/ecpo-fulltext-experiments GitHub Wiki

This wiki documents the process of extracting full text from the 1919–1940 issues of the Republican Chinese entertainment newspaper 晶報 Jīngbào.

Page Segmentation:

Rule-based Approaches

1.1 Morphological Opening to Connect Text Blocks

1.2 Finding and Connecting Separators
ML-driven Approaches

2.1 Fine-tuning eynollah

Character Segmentation Using HRCenterNet

The MTHv2 Dataset

Building an OCR Classifier: