Output - Speedymr01/The-PDF-inator GitHub Wiki

Output Directory

This directory is designated for storing the output files generated by the PDF Consolidator tool.

Structure

The output directory will contain subdirectories and files based on the processing tasks performed. Below is an explanation of the structure and the types of files you can expect:

Output Organization

  • For each processed PDF, a subdirectory named after the original PDF (without extension) will be created in output/.
  • All processed files related to that PDF (split pages, merged, deleted, duplicated, or OCR results) will be stored in this subdirectory.

Example

If your input PDF file is named document.pdf, the output directory will contain a subdirectory named document:

output/
└── document/
    ├── document_page_1.pdf
    ├── document_page_2.pdf
    ├── document_merged.pdf
    ├── document_deleted.pdf
    ├── document_duplicated.pdf
    ├── document_ocr.txt
  • document_page_1.pdf, document_page_2.pdf, ...: Individual pages from splitting.
  • document_merged.pdf: Result of merging with another PDF.
  • document_deleted.pdf: PDF after deleting a page.
  • document_duplicated.pdf: PDF after duplicating a page.
  • document_ocr.txt: Text file containing extracted text from the original PDF using OCR (PyMuPDF).

Notes

  • Ensure that input PDF files are not password-protected, as the tool may not be able to process them.
  • Large PDF files may take longer to process.
  • OCR is performed using PyMuPDF (fitz) and outputs a .txt file with extracted text for the whole PDF.
  • Check the log files in the logs directory for detailed information about processing tasks and any errors encountered.