Integration Guide - danfickle/openhtmltopdf GitHub Wiki
New releases of Open HTML to PDF will be distributed through Maven. Search maven for com.openhtmltopdf.
Current maven release is 1.0.10
. If you would like to be notified of new releases, please subscribe to the Maven issue.
You can ask for a new release, if needed.
Add these to your maven dependencies section as needed:
<properties>
<!-- Define the version of OPEN HTML TO PDF in the properties section of your POM. -->
<openhtml.version>1.0.10</openhtml.version>
</properties>
<dependency>
<!-- ALWAYS required, usually included transitively. -->
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-core</artifactId>
<version>${openhtml.version}</version>
</dependency>
<dependency>
<!-- Required for PDF output. -->
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-pdfbox</artifactId>
<version>${openhtml.version}</version>
</dependency>
<dependency>
<!-- Required for image output only. -->
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-java2d</artifactId>
<version>${openhtml.version}</version>
</dependency>
<dependency>
<!-- Optional, leave out if you do not need right-to-left or bi-directional text support. -->
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-rtl-support</artifactId>
<version>${openhtml.version}</version>
</dependency>
<dependency>
<!-- Optional, leave out if you do not need logging via slf4j. -->
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-slf4j</artifactId>
<version>${openhtml.version}</version>
</dependency>
<dependency>
<!-- Optional, leave out if you do not need SVG support. -->
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-svg-support</artifactId>
<version>${openhtml.version}</version>
</dependency>
<dependency>
<!-- Optional, leave out if you do not need MathML support. -->
<!-- Introduced in RC-13. -->
<groupId>com.openhtmltopdf</groupId>
<artifactId>openhtmltopdf-mathml-support</artifactId>
<version>${openhtml.version}</version>
</dependency>
Most of the options avaiable for PDF output are settable on the PdfRendererBuilder builder class. This shows the minimal possible configuration to output a PDF from an XHTML document.
import java.io.FileOutputStream;
import java.io.OutputStream;
import com.openhtmltopdf.pdfboxout.PdfRendererBuilder;
public class SimpleUsage
{
public static void main(String[] args) throws Exception {
try (OutputStream os = new FileOutputStream("/Users/me/Documents/pdf/out.pdf")) {
PdfRendererBuilder builder = new PdfRendererBuilder();
builder.useFastMode();
builder.withUri("file:///Users/me/Documents/pdf/in.htm");
builder.toStream(os);
builder.run();
}
}
}
// Add these imports (and remember the rtl-support maven module).
import com.openhtmltopdf.bidi.support.ICUBidiReorderer;
import com.openhtmltopdf.bidi.support.ICUBidiSplitter;
// Then call on the builder.
builder.useUnicodeBidiSplitter(new ICUBidiSplitter.ICUBidiSplitterFactory());
builder.useUnicodeBidiReorderer(new ICUBidiReorderer());
builder.defaultTextDirection(TextDirection.LTR); // OR RTL
While Open HTML to PDF works with a standard w3c DOM, the JSoup project provides a converter from the Jsoup HTML5 parser provided Document to a w3c DOM Document. This allows you to parse and use HTML5, rather than the default strict XML required by the project.
Then you can use one of the Jsoup.parse
methods to parse HTML5 and W3CDom::fromJsoup
to convert the Jsoup document to a w3c DOM one.
public org.w3c.dom.Document html5ParseDocument(String urlStr, int timeoutMs) throws IOException
{
URL url = new URL(urlStr);
org.jsoup.nodes.Document doc;
if (url.getProtocol().equalsIgnoreCase("file")) {
doc = Jsoup.parse(new File(url.getPath()), "UTF-8");
}
else {
doc = Jsoup.parse(url, timeoutMs);
}
// Should reuse W3CDom instance if converting multiple documents.
return new W3CDom().fromJsoup(doc);
}
Then you can set the renderer document with builder.withW3cDocument(doc, url)
in place of builder.withUri(url)
.
NOTE: This project previously provided a JSoup DOM converter module to do the same thing. This module is now removed (as of 1.0.1). Please migrate now.
Open HTML to PDF makes it simple to plugin an external client for HTTP and HTTPS requests. In fact this is recommended if you are using HTTP/HTTPS resources, as the built-in Java client is showing its age. For example, to use the excellent OkHttp library is as simple as adding the following code:
public static class OkHttpStreamFactory implements FSStreamFactory {
private final OkHttpClient client = new OkHttpClient();
@Override
public FSStream getUrl(String url) {
Request request = new Request.Builder()
.url(url)
.build();
try {
final Response response = client.newCall(request).execute();
return new FSStream() {
@Override
public InputStream getStream() {
return response.body().byteStream();
}
@Override
public Reader getReader() {
return response.body().charStream();
}
};
}
catch (IOException e) {
e.printStackTrace();
}
return null;
}
}
Then use builder.useHttpStreamImplementation(new OkHttpStreamFactory())
.
The library should close the reader or stream when it is finished with it.
NOTE: You can also use the useProtocolsStreamImplementation
method to add stream support for other protocols including app-specific ones:
// Here the buildFSStreamForS3 method is provided by your app to open an AWS S3
// readable stream to a file in your images bucket.
builder.useProtocolsStreamImplementation(uri -> buildFSStreamForS3("images", uri), "s3image");
// Then you can use your custom protocol.
<img src="s3image:test.jpg" alt="..."/>
IMPORTANT (1): This cache system should now be considered deprecated as it is not thread safe. It will be replaced with a simple byte array cache system in the future.
IMPORTANT (2): The cache system is totally broken, please see discussion in 204 for replacement options.
IMPORTANT (3): The cache system, except for the font-metrics cache has been removed. Please check the Fonts wiki page for more information on the font-metrics cache, which should be used when outputting multiple documents with fallback fonts.
By default, the code attempts to resolve relative URIs by using the document URI or CSS stylesheet URI as a base URI.
Absolute URIs are returned unchanged. If you wish to plugin your own resolver, you can.
This can not only resolve relative URIs but also resolve URIs in a private address space or even reject a URI. To use an external resolver
implement FSUriResolver
and use it with builder.useUriResolver(new MyResolver())
. The following example requires resources to be loaded through
SSL.
final NaiveUserAgent.DefaultUriResolver defaultUriResolver = new NaiveUserAgent.DefaultUriResolver();
builder.useUriResolver(new FSUriResolver() {
@Override
public String resolveURI(String baseUri, String uri) {
// First get an absolute version.
String supResolved = defaultUriResolver.resolveURI(baseUri, uri);
if (supResolved == null || supResolved.isEmpty())
return null;
try {
URI uriObj = new URI(supResolved);
// Only let through resources that are loaded through ssl.
if (uriObj.getScheme().equalsIgnoreCase("https"))
return uriObj.toString();
} catch (URISyntaxException e) {
e.printStackTrace();
}
return null;
}
});
NOTE: There is now a logging wiki page with more information.
Two options are provided by Open HTML to PDF. The default is to use java.util.logging. If you prefer to output using slf4j, an adaptor is provided. Add the appropriate maven module, then at the start of your code, before calling any Open HTML to PDF methods, use this code:
XRLog.setLoggingEnabled(true);
// For slf4j:
XRLog.setLoggerImpl(new Slf4jLogger());
NOTE: The Log4j 1.x adapter has been removed as of version 1.0.2 due to an unpatched CVE in Log4J 1.x.
By default XML allows the use of five character entities being &
, "
, '
, <
and >
.
If you'd like to use other character entities derived from XHTML such as
then you can use the special project doctype:
<!DOCTYPE html PUBLIC
"-//OPENHTMLTOPDF//DOC XHTML Character Entities Only 1.0//EN" "">
<html>
<body>
¥
</body>
</html>
If using MathML plugin, you can use a doctype containing both XHTML and MathML character entities:
<!DOCTYPE html PUBLIC
"-//OPENHTMLTOPDF//MATH XHTML Character Entities With MathML 1.0//EN"
"">
<html>
<body>
⁢ ⇒
</body>
</html>
All other doctypes will be ignored.
Add the appropriate maven module and include this line in your builder code for SVG support.
builder.useSVGDrawer(new BatikSVGDrawer());
For MathML support:
builder.useMathMLDrawer(new MathMLDrawer());
For an example of outputting to images see Java2D image output.
Thanks for using openhtmltopdf and please feel free to file any issues you are having trouble with.