PDF imaging - lmmx/devnotes GitHub Wiki
Background: see {{spin-systems/flow : : Literature-imaging}}
:hatching_chick: eidetic 1
- Rendering PDF to image is easy (scraping can wait: imaging first and foremost!)
Solutions
- from first thread I read, considering 'sejda console', has
pdftojpeg
function (but needs Java)- imagemagick can do this too
imagemagick readings
-
1
gs
handles the processing of PDFs forimagemagick
ImageMagick loads the entire pdf into the memory before process it. On the other hand ghostscript has the capability of processing 1 page at time which reduces the load on hardware by a lot. Here is a great article about that i used as a reference in this testing.
- NB code has source to try it out,
gs
benchmarks [on 8GB RAM] ~3-4x faster thanconvert
- NB code has source to try it out,
-
2 Stack Overflow: Convert PDF to image with high resolution
However, it would be harder to do your trimming and sharpening using gs, so, …, YMMV [Your Mileage May Vary]!
-
3 complaints on conversion quality
-
4 CLI example
-
5 more tips via S.O
-
6 more tips if needed via AskUbuntu
-
7 + note on batch
-
8 nice visual explanation of flags etc
"Best current tools for working with PDF files in Python?"
- Furthermore...
- top answer is some Windows/Java tool...
- told do not use Reportlab to just read PDFs