PDF imaging - lmmx/devnotes GitHub Wiki

Background: see {{spin-systems/flow : : Literature-imaging}}

:hatching_chick: eidetic 1

  • Rendering PDF to image is easy (scraping can wait: imaging first and foremost!)

Solutions

  • from first thread I read, considering 'sejda console', has pdftojpeg function (but needs Java)

imagemagick readings

  • 1 gs handles the processing of PDFs for imagemagick

    ImageMagick loads the entire pdf into the memory before process it. On the other hand ghostscript has the capability of processing 1 page at time which reduces the load on hardware by a lot. Here is a great article about that i used as a reference in this testing.

    • NB code has source to try it out, gs benchmarks [on 8GB RAM] ~3-4x faster than convert
  • 2 Stack Overflow: Convert PDF to image with high resolution

    However, it would be harder to do your trimming and sharpening using gs, so, …, YMMV [Your Mileage May Vary]!

  • 3 complaints on conversion quality

  • 4 CLI example

  • 5 more tips via S.O

  • 6 more tips if needed via AskUbuntu

  • 7 + note on batch

  • 8 nice visual explanation of flags etc

"Best current tools for working with PDF files in Python?"

2015 thread

  • Furthermore...
    • top answer is some Windows/Java tool...
    • told do not use Reportlab to just read PDFs