ReleaseNotes - UB-Mannheim/AustrianNewspapers GitHub Wiki
2.0.0
What's new?
Release 2.0.0 is a revised version according to the OCR-D GT Guidelines Level 2. Changes were made to the textual content, baselines, polygonal features, region tags and IDs of the PAGE-XML files as well as to the README and the repo folder structure.
Changes to the project structure
- Change folder structure and README according to OCRD-GT-Repo-Template
- Keep Validation and Trainingsset
- Delete gt linepairs
Enhancements PAGE-XML
- Standardisation of glyphs
- Double oblique hyphen (βΈ)
- Em dash (β) instead of En dash (β)
- Different variations of asterisks uniformed to asterisk (*)
- Enhancements and standardisation according to OCR-D Ground Truth Guidelines Level 2
- Long s (ΕΏ)
- R rotunda (κ)
- Fractions (ΒΌ Β½ ΒΎ β β β β β β β β β β β β β β β )
- Fraction slash (β) (U+2044), if
- can't be transcribed by a unicode fraction representation
- numerator and denominator are not on the same baseline height
- Labeling of text regions
- header
- headings
- paragraphs
- footer
- reference
- Correcting reading order
- Unique IDs based on the new reading order
Changes compared to release 1.1.0
Austrian Newspapers 2.0.0 provides revised transcriptions according to the OCR-D GT Guidelines Level 2.
Among others these revisions and enhancements include:
Austrian Newspapers 1.1.0 | Austrian Newspapers 2.0.0 | |
---|---|---|
Amount of "long s" [ΕΏ] | 57.599 | 58.629 |
Enhancement in % | - | 1.8 |
Amount of "double oblique hyphen" [βΈ] | 11.745 | 11.857 |
Enhancement in % | - | 0.9 |