Other ressources - Vermont-Complex-Systems/pdf-zoo GitHub Wiki

[tesseract's tips to improve output quality]https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html
pdfminer.six's converting pdf to text
pypdf's why extraction is hard
OCR vs text extraction