Using Incremental Saves - pymupdf/PyMuPDF GitHub Wiki
Since version 1.9.1, incremental saves are possible for PDF documents.
Saving incrementally has the following advantages (also see chapter 3.4.5 of the Adobe PDF Reference)
- it is normally a lot faster, because changes are appended to the file, it is not rewritten as a whole
- it spares handling intermediate files if all you want is actually updating a specific document
The call pattern of incremental saves is as follows:
doc.save(doc.name, incremental=True, ...)
Prerequisites
There are a number of prerequisites that must be met to use this facility:
- Incremental saves are not possible for encrypted files - even after they have been successfully decrypted.
- The internal structure of the document must be intact, i.e. no open in repair mode. If errors occur during opening the PDF, a flag will be set that prevents using incremental save later. A normal save is still possible.
- Option
incremental=True
excludes optionsgarbage
andlinear
. - The file to save to must obviously be the original one. Therefore, documents opened from a memory area cannot be saved incrementally.
Typical Uses
The most typical uses are small changes to the document, like adding or deleting a small number of pages, updating bookmarks or metadata, etc. If changes become significant, there will always be a breakeven point when saving to a new file is better performance-wise.
The following code snippet deletes empty pages from a text oriented PDF (like the Adobe manual ...):
lst = list(range(doc.pageCount))
for i in lst:
if not doc.getPageText(i): # no text in page?
lst.remove(i) # do not keep it
if len(lst) < doc.pageCount: # any pages without text?
doc.select(lst) # delete those
try: # try incremental
doc.save(doc.name, incremental=True)
except: # save new if in repair mode
doc.save("new.pdf", garbage=3)