Modify ePub - kiwidude68/calibre_plugins GitHub Wiki

MobileRead History

Introduction

The primary purpose of this plugin is to allow you to make repetitive cleanup or modifications to your ePub files to improve their appearance, file size or specification quality without the need to perform a calibre ePub -> ePub conversion.

Performing a calibre conversion will force changes to your file that you have no control over, some of which are undesirable. For instance the CSS file is completely rewritten, margins are changed, files are split based on images being detected, etc. Often there may be no visible harm, however in other cases it can and does cause issues.

This plugin can be seen as a companion to the Quality Check plugin which provides a number of menu options for detecting ePubs showing symptoms of issues which this plugin can resolve for you.

Configuration Dialog

Configuration

Access the configuration dialog via:

  • Preferences -> Plugins -> User interface action -> Modify ePub -> Customize plugin

Configuration Dialog

Option Description
Prompt to save ePubs Check this option to be prompted before overwriting your ePub book file.
Uncheck to silently update in the background without a prompt.

Known Artifacts Modifications

Option What it does Quality Check
Remove iTunes files When viewing an ePub via iTunes, it will insert a playlist file inside the ePub and sometimes an artwork file for displaying the cover in iTunes. These files can be considered 'cruft', particularly if you do not intend to use iTunes to view your ePubs in future.

Use this option to remove iTunes plist and artwork files from within the ePub.
Check iTunes files
Remove calibre bookmark files When viewing ePub files in calibre, it will insert a bookmarks file similar to iTunes above storing your last reading position and any bookmarks added. You can disable this feature of the viewer in the viewer preferences.

Use this option to remove calibre bookmark txt files from within the ePub.
Check calibre bookmark files
Remove OS artifacts When extracting the contents of an ePub to your hard-drive and then rezipping, there can be additional files created by your operating system inserted that are not related to the ePub. For instance on windows systems an images folder may have a Thumbs.db file generated for the preview view of images in that folder. On MACOSX it may add .DS_Store files.

Use this option to remove OS files from within the ePub.
Check OS artifacts
Remove unused images Looks for orphaned .jpg, .png and .gif files that are not referred to from the HTML content pages and can be removed from the ePub. This can happen if for instance in Sigil you delete a page without removing the associated image(s).
WARNING: This does not inspect CSS files.

Use this option to remove unused image files from within the ePub to reduce space required.
Check unused image files
Unpretty De-indents HTML code and makes sure paragraphs/headers have their own lines.
Strip spans Removes empty formatting elements (ie. <i/> or <b></b>) and <span> elements that have no attributes.
This also converts empty container elements into self-closing elements (ie.<br></br> becomes <br/>).
Strip Kobo DRM remnants Removes elements related to Kobo's DRM.
This also removes the kobo.js and rights.xml files from the book, any references to them, and the Kobo CSS definition in the documents.

Manifest (.opf File) Modifications

Option What it does Quality Check
Remove missing file entries from manifest If the ePub has been manually tweaked, it is possible that someone deleted a file from the directory but did not remove the entry from the .opf manifest xml file. Most tools will ignore missing files when viewing or editing that ePub, however it cannot be guaranteed that is always the case. It could also be from a typing error if the manifest was manually edited or the file renamed afterwards.

Use this option to cleanup the manifest by removing references to any missing files.
Check manifest files missing
Add unmanifested files to manifest The ePub may contain files that are not listed in the .opf manifest. These could be from incorrect matching names in the manifest, from orphaned files that should be deleted, or from third party tools that leave 'cruft' inside the ePub file. Note that iTunes files, calibre bookmarks and OS artifacts are explicitly ignored by this check.

Use this option to add all files not currently listed in the manifest into it.
Check unmanifested files
Remove unmanifested files from ePub See above for the causes.

Use this option to delete all of the orphaned files in the ePub that are not listed in the manifest file.
Check unmanifested files

Adobe Modifications

Option What it does Quality Check
Remove margins from Adobe .xpgt files ePubs that have been created using Adobe tools will contain a .xpgt file that enforces margins. These are in conflict with the traditional css styles and can cause wasted space when viewing on devices. Recent versions of calibre when converting will zero any margins in such a file.

Use this option to zero the margins in the .xpgt file without needing to perform a conversion.
This option is redundant and ignored if you tick Remove .xpgt files and links.
Check Adobe .xpgt margins
Remove Adobe .xpgt files and links As a more extreme version of the remove margins option above, users may choose to obliterate the .xpgt file completely along with any links to it from the xhtml files.

Use this option to remove Adobe margins and all associated xpgt cruft without needing to perform a conversion.
Check Adobe inline .xpgt links
Remove Adobe resource DRM meta tags Books that have had DRM protection removed will still contain an Adobe tag with a urn identifier in the xhtml files.

Use this option to remove Adobe DRM cruft.
Important: Using this can break obfuscated fonts in the book.
Check Adobe DRM meta tag
Remove page maps Adobe has defined a proprietary extension to the ePub standard which identifies where pages break in the print version of a book.
This does not affect pagelists found in NCX files, which are part of the ePub standard.

Use this option to remove this file and, for Google Play books, the related anchors in the HTML code.
Remove only Google Play page maps The Google Play bookstore adds an Adobe page map file which does not correspond to any print version of the book.

Use this option to remove only Google Play page map files, as opposed to those from other sources.
This option is redundant if you tick Remove page maps.

Table of Contents Modifications

Option What it does Quality Check
Flatten TOC hierarchy in NCX file Some devices do not work well with hierarchical TOC (table of contents) navPoint entries in the ncx file. This option will flatten such entries to all be at the same top level.

Use this option to make a flat TOC to work better with some devices.
Check TOC hierarchical
Remove broken TOC entries in NCX file An NCX file containing broken links from missing html pages will cause errors when viewed. Broken links are most frequently caused by calibre conversions (orphaned cover page links) or manual editing via Tweak ePub/Sigil and not editing the NCX.

Use this option to ensure your TOC does not contain any entries which will cause errors due to missing content.
Check TOC with broken links

Metadata Jacket Modifications

Option What it does Quality Check
Remove all metadata jackets Remove any calibre generated jackets listing book metadata such as title, authors, comments and rating. Jackets removed are both those from the latest versions of calibre, and those 'legacy' jackets generated using versions of calibre prior to 0.6.50. The 'newer' jackets are able to be identified by a metadata tag in the xhtml.

Use this option if you do not want jackets in your books.
Check having any jacket
Remove legacy metadata jackets Removes calibre generated jackets that were created using versions of calibre prior to 0.6.50. These jackets cause a problem when the file is reconverted, as the calibre code does not detect them and will duplicate and potentially split them.

Use this option if you do not want the legacy jackets, or intend to reconvert in future and wish to avoid issues.
This option is redundant and ignored if you tick Remove all metadata jackets
Check having legacy jacket
Add/replace metadata jacket Creates a metadata jacket page in the ePub if it does not exist, or replaces any existing one if it is found.

Use this option if you want to add a metadata jacket without performing a conversion.
Jacket at the end of the book If a jacket is added/replaced, it is placed at the end of the book instead of the beginning.

HTML & Style Modifications

Option What it does Quality Check
Encode HTML in UTF-8 Some ePubs have an invalid encoding in their HTML pages, which means they render incorrectly in readers like the calibre ebook viewer, Sigil or a web browser. Most often this is seen as strange characters appearing instead of quotes etc. These will however render correctly in ADE. Rather than doing an ePub->ePub conversion, we can instead strip the invalid <meta> tag from the html pages and insert an xml declaration indicating utf-8 instead which most often will be sufficient to resolve the issue.

Use this option to fix invalid encoding declarations on html pages that do not render correctly.
Remove embedded fonts Some ePubs carry embedded fonts as .ttf or .otf files, to ensure that their content is rendered with a font representing all the characters they contain. Some devices may not support embedded fonts, and these do significantly increase the ePub size so some users prefer to remove them. This also removes any @font-face declarations from css or html files.

Use this option to remove embedded font files.
Check embedded fonts
Modify @page and body margin styles An ePub that has not been converted by yourself in calibre may have body or @page styles with margins set to values that differ from your desired defaults. You can set your calibre conversion defaults using Preferences -> Conversion -> Common Options -> Page Setup. If you set negative values then this option will remove the margin attributes from the ePub, and if a CSS file is now empty then it will be removed from the ePub completely. Otherwise it will write whatever default value you have specified into a new @page style in each CSS file. Note it does not currently support changing named body styles.

Use this option to remove @page and body margin values and if your calibre defaults are non-negative then rewrite into an @page style.
Check CSS book margins and Check inline @page margins
Append extra CSS Use this option to append any css style information from:
Preferences -> Common Options -> Look & Feel -> Extra CSS to each .css file in the ePub.

Use this option to append extra CSS styles to your ePub without a conversion.
Smarten punctuation Processes any html files in the ePub to ensure quotes and apostrophes are converted to smart quotes. In addition, double hyphens (--) are converted to an emdash &emdash;.

Use this option to prettify your ePub without a conversion.
Check smarten punctuation
Remove inline javascript and files Looks for any .js files forming part of the ePub and and inline <script type="text/javascript"> blocks. Javascript is usually a leftover from an original conversion from html and is unnecessary in an ePub.

Use this option to remove javascript cruft from your ePubs.

Cover Modifications

Option What it does Quality Check
Remove broken image pages Looks for html pages that contain nothing but a <img> or <image> tag that links to a non-existent image file. If that html page body contains no other text content, then the html page will be completely removed from the epub.

Use this option to remove orphaned cover pages that result from some calibre epub conversions due to the way it replaces some cover pages.
Check broken image links
View the log for details.
Remove existing cover Examines the ePub to see if it has an existing cover identified by guide and/or meta entries in the opf manifest. If such a cover can be found, the relevant entries that indicate it is a cover page are removed, along with the cover page itself if it has no other images/text on it.

Use this option to remove cover pages from an epub if you do not want them such as either to reduce size or if removing calibre generated default covers.
This option is redundant and ignored if you tick Insert or replace cover.
Insert or replace cover Performs the same steps as Remove existing cover above to identify and remove any existing cover. A new cover page will be generated using your default ePub output options and inserted as the first page in your ePub using the image associated with that book in your library. This option is far more reliable than using Update metadata option below to replace your covers, as it handles far more scenarios of identifying an existing cover.

Use this option to insert a new cover (replacing the existing one if detectable) to your ePub without requiring a conversion.

Metadata Modifications

Option What it does Quality Check
Update metadata Updates the ePub metadata in the manifest (such as title, description, authors etc). In some limited circumstances it will also update the cover as well, however you should instead also check the Insert or replace cover option for a more reliable cover replacement option.

Use this option to update the title/author/description internal metadata for your ePub to get it "up to date" for use in the calibre ebook viewer.
Remove non dc: metadata elements Applications like calibre and Sigil will insert metadata elements of their own in the manifest opf file that have no relevance to the ePub and are either informational or only for use by that tool. The Update metadata option above will insert such elements. These elements do no harm, but if you are publishing your ePub via other websites, you may want such evidence of editing removed first.

Use this option to remove any elements from the manifest xml that start with the dc: namespace for a clean ePub.
Check non dc: metadata

Running from Command Line

It is possible to run most of the functions of this plugin from the command line using a python script that is bundled inside the zip file. This allows users to use a feature such as smarten punctuation against an ePub without having to add it to the calibre GUI. Note that the command line script still requires that calibre and the Modify ePub plugin are installed as per usual - it just avoids requiring the books to be added to calibre and interactive gui clicks.

To make use of the command line script, open the zip file into the commandline subfolder and extract the me.py file along with the README.md and follow the instructions within.

Donations

If you enjoy my calibre plugins or extensions, please feel free to show your appreciation!

paypal

paypal.me/kiwicalibre

⚠️ **GitHub.com Fallback** ⚠️