s2orc doc2json - Vermont-Complex-Systems/pdf-zoo GitHub Wiki
s2orc-doc2json
tags: #pdf2markdown
, #layoutAnalysis
inst: AllenAI
deps: grobid
Converts academic papers to structured JSON format. Part of the S2ORC dataset pipeline, combining GROBID parsing with additional processing for clean document structure extraction.