Home - north-china-herald/XML-files GitHub Wiki
North China Herald - How To
Digitization
Stage 1 - Microfilm Instructions
Stage 2 - Image Stitching
Stage 3 - Storage
OCR Processing
Stage 1 - Setting up and Running OCR
Stage 2 - Verifying OCR
XML Creation
Stage 1 - Template
https://github.com/north-china-herald/XML-files/blob/master/Template
Stage 2 - Page 1
- Element Types: Advertisement, Item, Article
- Copy and Paste repeating advertisements as they appear on Page 1.
- When you encounter a new advertisement or other element, insert a comment and continue Copy/Pasting repeats or create a new element.
- When you encounter a table check to Copy/Paste an earlier element, if none exist create a new table in "Text" or "Author" mode.
Stage 3 Page 2 & 3
- Element Types: Item, Article
- Create new elements from the verified OCR results.