The iText® XMLWorker is a package created for transforming XML files to PDF. Although parsing XML was already possible with iText, a new version has been created. Many developers use the XML to PDF capabilities to parse simple HTML/XHTML snippets to PDF but the support for CSS was somewhat limited. In the new XMLWorker there is better support for CSS. Initially this is done purely for parsing XHTML tags with CSS2, which is created with a wysiwyg editors (e.g. TinyMCE or CKEditor) for instance. Of course it does not end here. By using the XMLWorker it is possible to parse all kinds of XML and use CSS in them, although this requires specific implementation of the XML-tags and/or CSS-styles.
The XMLWorker is initially created to parse snippets and absolute positioning is not yet supported. As a result, it is currently not possible to surround everything with borders for instance. For the current CSS limitations see CSS Support.
Parsing XHTML snippets can be done with the default implementation for parsing HTML to PDF. See code examples for usage tips.
Tag | Comment | Supported Attributes |
---|---|---|
xml | if available used for parsing charset / must be tested more deeply | encoding |
html | ignored | |
head | ignored | |
title | if a document is available, the title is set with document.addTitle(title) . |
|
meta | parses http-equiv="Content-Type" and the charset | http-equiv, content |
script | ignored | |
style | parsed and added to css processing | |
link | css is parsed and added to global styles | type, href |
body | direct content in body is added | |
a | supported | href, name |
br | supported | |
div | direct content in div is added | |
h1 to h6 | supported | |
p | supported | |
span | supported | |
img | supported | src, width, height |
hr | supported | |
ul, ol, li | supported | |
dfn, dl, dt | supported | |
table | supported Nested tables can work, but we advise against using them |
width, border, cellspacing, cellpadding |
tr | supported | |
td, th | supported | width, rowspan, colspan |
thead, tfoot, tbody | supported | |
caption | caption element of a table is supported | |
sub | supported | |
sup | supported | |
small, big | supported | |
b, strong | supported | |
u, ins | supported | |
i, cite, em, var, dfn, address | supported | |
pre, tt, code, kbd, samp | supported | |
s, strike, del | supported |
The implementation is not fully finished. There are a couple areas
that need to be fixed/improved and are still worked on.
It is
possible not all CSS will behave as expected, there is a lot and not
every possible combination is fully tested and implemented.
Javascript
is totally ignored at the moment.
The provided snippets content
character encoding is taken into account, but no tests has been done with it yet.
n = not supported, f = fully supported, s = somehow supported
Property The CSS property (CSS2/3) |
Text CSS properties applicable on text |
tables CSS properties applicable on tables (table, td, tr) |
list CSS properties applicable on list (ul, ol, li) |
image CSS properties applicable on images (img) |
---|---|---|---|---|
background | ||||
background-attachment | n | n | n | |
background-color | f | f | n | |
background-image | n | n | n | |
background-position | n | n | n | |
background-repeat | n | n | n | |
border | n | f | n | n |
border-bottom | n | f | n | n |
border-bottom-color | n | f | n | n |
border-bottom-style | n | s | n | n |
border-bottom-width | n | f | n | n |
border-color | n | f | n | n |
border-collapse | n - always collapsed | |||
border-left | n | f | n | n |
border-left-color | n | f | n | n |
border-left-style | n | s | n | n |
border-left-width | n | f | n | n |
border-right | n | f | n | n |
border-right-color | n | f | n | n |
border-right-style | n | s | n | n |
border-right-width | n | f | n | n |
border-spacing | n | |||
border-style | n | s | n | n |
border-top | n | f | n | n |
border-top-color | n | f | n | n |
border-top-style | n | s | n | n |
border-top-width | n | f | n | n |
border-width | n | f | n | n |
bottom | n | n | n | n |
caption-side | f | |||
clear | n | n | n | n |
clip | n | n | n | n |
color | f | |||
content | n | n | n | n |
counter-increment | n | n | n | |
counter-reset | n | n | n | |
cursor | n | n | n | |
direction | n | n | n | |
display | n | n | n | n |
empty-cells | f | |||
float | n | n | n | n |
font | f | |||
font-family | f | |||
font-size | f | |||
font-style | f | |||
font-variant | n | |||
font-weight | f | |||
height | n | f | n | |
left | n | n | n | |
letter-spacing | f | |||
line-height | f | |||
list-style | f | |||
list-style-image | f | |||
list-style-position | f | |||
list-style-type | f | |||
margin | f | f | s (not on li) | n |
margin-bottom | f | f | f | n |
margin-left | f | f | f (not on li) | n |
margin-right | f | f | s (not on li) | n |
margin-top | f | f | f | |
max-height | n | n | n | |
max-width | n | n | n | |
min-height | n | n | n | |
min-width | n | n | n | |
orphans | n | n | n | |
outline | n | n | n | |
outline-color | n | n | n | |
outline-style | n | n | n | |
outline-width | n | n | n | |
overflow | n | n | n | |
padding | f | f | s | n |
padding-bottom | f | f | f | |
padding-left | f | f | f (not on li) | |
padding-right | f | f | f (not on li) | |
padding-top | f | f | f | |
page-break-after | s - only value always | s - only value always | s - only value always | s - only value always |
page-break-before | s - only value always | s - only value always | s - only value always | s - only value always |
page-break-inside | n | n | n | |
position | n | n | n | n |
quotes | n | n | n | |
right | n | n | n | n |
table-layout | s | |||
text-align | f | |||
text-decoration | f | |||
text-indent | f | |||
text-shadow | n | |||
text-transform | n | |||
top | n | n | n | n |
unicode-bidi | n | n | n | |
vertical-align | f | f | n | |
visibility | n | n | n | n |
white-space | n | n | n | |
widows | n | n | n | |
width | n | f | n | |
word-spacing | n | n | n | |
z-index | n | n | n |
page-break-before
and page-break-after
inside tags that are 1 element in PDF (like lists or tables) the
outcome of adding a new page is unpredicted.
table
can implement 2 XMLWorker specific styles:
repeat-header:yes
repeat-footer:yes
The quick way
// create a document to write to final Document doc = new Document(); PdfWriter.getInstance(doc, new FileOutputStream("out.pdf")); // make sure it's open doc.open(); // read the html from somewhere BufferedInputStream bis = new BufferedInputStream(new FileInputStream("snippet.html")); // parse and listen for elements to add to the document helper.parseXHtml(new ElementHandler() { public void addAll(final List<Element> currentContent) throws DocumentException { for (Element e : currentContent) { doc.add(e); } } public void add(final Element e) throws DocumentException { doc.add(e); } }, new InputStreamReader(bis)); doc.close();
The extended setup
// Create a TagProcessor DefaultTagProcessorFactory htmlTagProcessorFactory = (DefaultTagProcessorFactory) new Tags().getHtmlTagProcessorFactory(); // if needed override tag that you don't want to parse to DummyTagProcessor htmlTagProcessorFactory.addProcessor("img", new DummyTagProcessor()); htmlTagProcessorFactory.addProcessor("link", new DummyTagProcessor()); // Create a fresh configuration and set needed configuration objects. XMLWorkerConfigurationImpl config = new XMLWorkerConfigurationImpl(); // Attach the default CSS to a new CSSResolver CssFile defaultCSS = new XMLWorkerHelper().getDefaultCSS(); StyleAttrCSSResolver cssResolver = new StyleAttrCSSResolver(); cssResolver.addCssFile(defaultCSS); // attach more CSS files if needed cssResolver.addCssFile(otherCssFile); // set the TagProcessorFactory config.tagProcessorFactory(htmlTagProcessorFactory).cssResolver(cssResolver) .acceptUnknown(true); // create a document final Document doc = new Document(); doc.setPageSize(PageSize.A4); // create writer PdfWriter writer = PdfWriter.getInstance(doc, outputStream); writer.setPageEvent(new WatermarkEvent()); // set margins for first page float margin = CssUtils.getInstance().parsePxInCmMmPcToPt("8px"); doc.setMargins(margin, margin, margin, margin); // OPEN the document ! doc.open(); config.document(doc).pdfWriter(writer); // create the worker final XMLWorker worker = new XMLWorkerImpl(config); // attach an ElementHandler worker.setDocumentListener(new ElementHandler() { public void addAll(final List<Element> arg0) throws DocumentException { for (Element e : arg0) { doc.add(e); } } public void add(final Element e) throws DocumentException { doc.add(e); } }); // Set the worker in the parser and start parsing XMLParser p = new XMLParser(worker); p.parse(new StringReader(content)); writer.close();
There is a demo available where input can be done through a tinyMCE editor and a PDF is created from the provided input.
com.itextpf.text.Element
objects to the document. (e.g. a ColumnText)Links
iText, online
demo, XML Worker Project