folderToImg.py - haltosan/RA-python-tools GitHub Wiki

Folder To Image

This scans through a directory and converts all pdf's to jpgs. Each pdf will have its own folder named after it (-images).

Usage

python folderToImg.py [input path] [output path]

  • input path : the folder the pdf's are in; all files that have the pdf extension in this folder will be read in
  • output path : the folder to save the images in; this will be a folder full of image folders (outputPath/1.pdf-images/, outputPath/2.pdf-images/, ...)

A note about page range: the for loop inside pdfToImages will only go [1,200). This loop stops automatically the moment it thinks we ran out of pages. This means any pdf that needs different ranges will fail, so please adjust if needed.

Modification

The poppler_path variable needs to point at a working instance of poppler.

IMAGE_QUALITY is a pdf2image argument. This is safe to keep at 300 for most imaging, but can be turned up to 500 for better quality. Anything 700+ can cause python to yell at you, so be careful (something along the lines of a deflation/unzip bomb).

IMAGE_TYPE should be kept the same. I'm not sure if pdf2image's page.save does anything else, so that would be on you to discover.

The ordering of the files seems anywhere from sorted to random. This is due to os.listdir() ordering based on file location (?), so if ordering is needed, sort the list before the loop.

⚠️ **GitHub.com Fallback** ⚠️