s2orc doc2json - Vermont-Complex-Systems/pdf-zoo GitHub Wiki

s2orc-doc2json

tags: #pdf2markdown, #layoutAnalysis
inst: AllenAI
deps: grobid

Converts academic papers to structured JSON format. Part of the S2ORC dataset pipeline, combining GROBID parsing with additional processing for clean document structure extraction.