Other ressources - Vermont-Complex-Systems/pdf-zoo GitHub Wiki
- [tesseract's tips to improve output quality]https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html
- pdfminer.six's converting pdf to text
- pypdf's why extraction is hard
- OCR vs text extraction