publaynet - Vermont-Complex-Systems/pdf-zoo GitHub Wiki

publaynet

tags: #model-based, #layoutAnalysis
inst: IBM Research
paper: https://arxiv.org/abs/1908.07836

Large-scale dataset and models for document layout analysis. Contains over 1 million PDF pages with layout annotations, widely used for training document understanding models.