Home - north-china-herald/XML-files GitHub Wiki

North China Herald - How To

Digitization

Stage 1 - Microfilm Instructions

Stage 2 - Image Stitching

Stage 3 - Storage

OCR Processing

Stage 1 - Setting up and Running OCR

Stage 2 - Verifying OCR

XML Creation

Stage 1 - Template

https://github.com/north-china-herald/XML-files/blob/master/Template

Stage 2 - Page 1

  • Element Types: Advertisement, Item, Article
  • Copy and Paste repeating advertisements as they appear on Page 1.
  • When you encounter a new advertisement or other element, insert a comment and continue Copy/Pasting repeats or create a new element.
  • When you encounter a table check to Copy/Paste an earlier element, if none exist create a new table in "Text" or "Author" mode.

Stage 3 Page 2 & 3

  • Element Types: Item, Article
  • Create new elements from the verified OCR results.