Introduction

The iText® XMLWorker is a package created for transforming XML files to PDF. Although parsing XML was already possible with iText, a new version has been created. Many developers use the XML to PDF capabilities to parse simple HTML/XHTML snippets to PDF but the support for CSS was somewhat limited. In the new XMLWorker there is better support for CSS. Initially this is done purely for parsing XHTML tags with CSS2, which is created with a wysiwyg editors (e.g. TinyMCE or CKEditor) for instance. Of course it does not end here. By using the XMLWorker it is possible to parse all kinds of XML and use CSS in them, although this requires specific implementation of the XML-tags and/or CSS-styles.

Current Limitations

The XMLWorker is initially created to parse snippets and absolute positioning is not yet supported. As a result, it is currently not possible to surround everything with borders for instance. For the current CSS limitations see CSS Support.

Parsing XHTML/CSS snippets

Parsing XHTML snippets can be done with the default implementation for parsing HTML to PDF. See code examples for usage tips.

Supported tags

Tag Comment Supported Attributes
xml if available used for parsing charset / must be tested more deeply encoding
html ignored
head ignored
title if a document is available, the title is set with document.addTitle(title).
meta parses http-equiv="Content-Type" and the charset http-equiv, content
script ignored
style parsed and added to css processing
link css is parsed and added to global styles type, href
body direct content in body is added
a supported href, name
br supported
div direct content in div is added
h1 to h6 supported
p supported
span supported
img supported src, width, height
hr supported
ul, ol, li supported
dfn, dl, dt supported
table supported
Nested tables can work, but we advise against using them
width, border, cellspacing, cellpadding
tr supported
td, th supported width, rowspan, colspan
thead, tfoot, tbody supported
caption caption element of a table is supported
sub supported
sup supported
small, big supported
b, strong supported
u, ins supported
i, cite, em, var, dfn, address supported
pre, tt, code, kbd, samp supported
s, strike, del supported

Known issues

The implementation is not fully finished. There are a couple areas that need to be fixed/improved and are still worked on.
It is possible not all CSS will behave as expected, there is a lot and not every possible combination is fully tested and implemented.
Javascript is totally ignored at the moment.
The provided snippets content character encoding is taken into account, but no tests has been done with it yet.

CSS Support

n = not supported, f = fully supported, s = somehow supported

Property
The CSS property (CSS2/3)
Text
CSS properties applicable on text
tables
CSS properties applicable on tables (table, td, tr)
list
CSS properties applicable on list (ul, ol, li)
image
CSS properties applicable on images (img)
background
background-attachment n n n
background-color f f n
background-image n n n
background-position n n n
background-repeat n n n
border n f n n
border-bottom n f n n
border-bottom-color n f n n
border-bottom-style n s n n
border-bottom-width n f n n
border-color n f n n
border-collapse n - always collapsed
border-left n f n n
border-left-color n f n n
border-left-style n s n n
border-left-width n f n n
border-right n f n n
border-right-color n f n n
border-right-style n s n n
border-right-width n f n n
border-spacing n
border-style n s n n
border-top n f n n
border-top-color n f n n
border-top-style n s n n
border-top-width n f n n
border-width n f n n
bottom n n n n
caption-side f
clear n n n n
clip n n n n
color f
content n n n n
counter-increment n n n
counter-reset n n n
cursor n n n
direction n n n
display n n n n
empty-cells f
float n n n n
font f
font-family f
font-size f
font-style f
font-variant n
font-weight f
height n f n
left n n n
letter-spacing f
line-height f
list-style f
list-style-image f
list-style-position f
list-style-type f
margin f f s (not on li) n
margin-bottom f f f n
margin-left f f f (not on li) n
margin-right f f s (not on li) n
margin-top f f f
max-height n n n
max-width n n n
min-height n n n
min-width n n n
orphans n n n
outline n n n
outline-color n n n
outline-style n n n
outline-width n n n
overflow n n n
padding f f s n
padding-bottom f f f
padding-left f f f (not on li)
padding-right f f f (not on li)
padding-top f f f
page-break-after s - only value always s - only value always s - only value always s - only value always
page-break-before s - only value always s - only value always s - only value always s - only value always
page-break-inside n n n
position n n n n
quotes n n n
right n n n n
table-layout s
text-align f
text-decoration f
text-indent f
text-shadow n
text-transform n
top n n n n
unicode-bidi n n n
vertical-align f f n
visibility n n n n
white-space n n n
widows n n n
width n f n
word-spacing n n n
z-index n n n

Notes

Examples

Default XHTML/CSS processing (Java):

The quick way

	// create a document to write to
	final Document doc = new Document();
	PdfWriter.getInstance(doc, new FileOutputStream("out.pdf"));
	// make sure it's open
	doc.open();
	// read the html from somewhere
	BufferedInputStream bis = new BufferedInputStream(new FileInputStream("snippet.html"));
	// parse and listen for elements to add to the document
	helper.parseXHtml(new ElementHandler() {

		public void addAll(final List<Element> currentContent) throws DocumentException {
			for (Element e : currentContent) {
				doc.add(e);
			}

		}

		public void add(final Element e) throws DocumentException {
			doc.add(e);
		}
	}, new InputStreamReader(bis));
	doc.close();
	

The extended setup

	// Create a TagProcessor
	DefaultTagProcessorFactory htmlTagProcessorFactory = (DefaultTagProcessorFactory) new Tags().getHtmlTagProcessorFactory();
	// if needed override tag that you don't want to parse to DummyTagProcessor
	htmlTagProcessorFactory.addProcessor("img", new DummyTagProcessor());
	htmlTagProcessorFactory.addProcessor("link", new DummyTagProcessor());
	// Create a fresh configuration and set needed configuration objects.
	XMLWorkerConfigurationImpl config = new XMLWorkerConfigurationImpl();
	// Attach the default CSS to a new CSSResolver
	CssFile defaultCSS = new XMLWorkerHelper().getDefaultCSS();
	StyleAttrCSSResolver cssResolver = new StyleAttrCSSResolver();
	cssResolver.addCssFile(defaultCSS);
	// attach more CSS files if needed
	cssResolver.addCssFile(otherCssFile);
	// set the TagProcessorFactory
	config.tagProcessorFactory(htmlTagProcessorFactory).cssResolver(cssResolver)
			.acceptUnknown(true);
	// create a document
	final Document doc = new Document();
	doc.setPageSize(PageSize.A4);
	// create writer
	PdfWriter writer = PdfWriter.getInstance(doc, outputStream);
	writer.setPageEvent(new WatermarkEvent());
	// set margins for first page
	float margin = CssUtils.getInstance().parsePxInCmMmPcToPt("8px");
	doc.setMargins(margin, margin, margin, margin);
	// OPEN the document !
	doc.open();
	config.document(doc).pdfWriter(writer);
	// create the worker
	final XMLWorker worker = new XMLWorkerImpl(config);
	// attach an ElementHandler
	worker.setDocumentListener(new ElementHandler() {

		public void addAll(final List<Element> arg0) throws DocumentException {
			for (Element e : arg0) {
				doc.add(e);
			}

		}

		public void add(final Element e) throws DocumentException {
			doc.add(e);
		}
	});
	// Set the worker in the parser and start parsing
	XMLParser p = new XMLParser(worker);
	p.parse(new StringReader(content));
	writer.close();
	

Demo

There is a demo available where input can be done through a tinyMCE editor and a PDF is created from the provided input.

Plans for the future

Links
iText, online demo, XML Worker Project