Output - Speedymr01/The-PDF-inator GitHub Wiki
Output Directory
This directory is designated for storing the output files generated by the PDF Consolidator tool.
Structure
The output directory will contain subdirectories and files based on the processing tasks performed. Below is an explanation of the structure and the types of files you can expect:
Output Organization
- For each processed PDF, a subdirectory named after the original PDF (without extension) will be created in
output/
. - All processed files related to that PDF (split pages, merged, deleted, duplicated, or OCR results) will be stored in this subdirectory.
Example
If your input PDF file is named document.pdf
, the output directory will contain a subdirectory named document
:
output/
└── document/
├── document_page_1.pdf
├── document_page_2.pdf
├── document_merged.pdf
├── document_deleted.pdf
├── document_duplicated.pdf
├── document_ocr.txt
document_page_1.pdf
,document_page_2.pdf
, ...: Individual pages from splitting.document_merged.pdf
: Result of merging with another PDF.document_deleted.pdf
: PDF after deleting a page.document_duplicated.pdf
: PDF after duplicating a page.document_ocr.txt
: Text file containing extracted text from the original PDF using OCR (PyMuPDF).
Notes
- Ensure that input PDF files are not password-protected, as the tool may not be able to process them.
- Large PDF files may take longer to process.
- OCR is performed using PyMuPDF (
fitz
) and outputs a.txt
file with extracted text for the whole PDF. - Check the log files in the
logs
directory for detailed information about processing tasks and any errors encountered.