publaynet - Vermont-Complex-Systems/pdf-zoo GitHub Wiki
publaynet
tags: #model-based
, #layoutAnalysis
inst: IBM Research
paper: https://arxiv.org/abs/1908.07836
Large-scale dataset and models for document layout analysis. Contains over 1 million PDF pages with layout annotations, widely used for training document understanding models.