Scrubbing HTML from epub files - Paperight/website GitHub Wiki
In order to use the automated HTML-PDF converter on paperight.com/server, you will need to extract HTML from epubs provided by publishers or sourced from Gutenberg.
- Open the .epub file in Sigil. Delete the wrapper file. Merge the separated HTML files, by using the “merge with previous” function. You can find this by right-clicking on the second from first document in the HTML list. Save the changes and close Sigil.
- Right click on the file name to re-name. Change the file extension to .zip.
- “Extract all” from the zipped folder.
- In the extracted file, find the HTML in the text folder. Rename according to the correct file naming conventions (isbn_title_date), and move from the extracted folder to your working folder.
- Remove licensing information from the beginning and end of the html.
- Remove
<meta content>
tags at the beginning of the text that refers to Gutenberg, or the Gutenberg epub css. - Remove
<style>
tags that may affect the Paperight css file (these should be above the tag), unless these provide semantic meaning.
- Make sure that you've uploaded the cover image to the jacket_images folder on the media.paperight.com FTP (the public FTP).
If your HTML includes a <front>
element between
<div class="front">
and move it inside the <body>
element.
As the first element in <body>
, in <div class="front">
, include a link to the cover image, place it in a <div>
and give it the class="cover".
e.g.
<div class="front">
<div class="cover">
<img alt="The Tragedy of Mariam cover" src="http://media.paperight.com/content/jacket_images/the-tragedy-of-mariam_cary_cover_20140107.jpg"/>
</div>
</div><!--.front-->
- In
<body>
, directly after the<body>
tag and<div class="front">
element, wrap all the content in a div classed paperight-ed-content (this is for page numbering function in the css). NB: Dont forget to close the div at the end of the document (see below).
e.g.
<body>
<div class="paperight-ed-content">
..........
</div><!--.paperight-ed-content-->
</body>
</html>
- Edit tags:
Title: <h1 class="title">Jacob's Room</h1>
Author: <p class="author">Virginia Woolf</p>
Chapter: <h2 class="chapter-number">Chapter 1</h2>
<p class="chapter-title"></p>
Bodytext-first: <p class="bodytext-first"></p>*
Bodytext: <p></p>
-
Remove
<hr />
elements -
Run HTML through W3C validator: http://validator.w3.org/check
To create a footnote, use a span with the class "fn". The text of the footnote should be placed in the text, directly after the location of the footnote.
e.g.
<p>
Footnotes<span class="fn">A footnote is a note placed at
the bottom of a page of a book or manuscript that comments on or
cites a reference for a designated part of the text.</span> are
essential in printed documents and Prince knows how to generate
them. Most readers will read the footnotes before they read the text
from where the footnotes are anchored<span class="fn">Often,
the most interesting information is found in the footnotes.</span>.
</p>