How to extend the HtmlPipeline class
We've already configured a HtmlPipeline
by changing the HtmlPipelineContext
.
We've defined an ImageProvider
and a LinkProvider
and applied it using the
setImageProvider()
and setLinkProvider()
method, but there's more.
Each time a new XMLWorker
/XmlParser
is started with the same HtmlPipeline
,
the context is cloned using some defaults. You can change these defaults with the following methods:
charSet()
method change the character setsetPageSize()
method changess the default page size (which is A4)autoBookmark()
method enables or disables the automatic creation of bookmarks. The default is: enabled (true
).setAcceptUnknown()
method should XML Worker accept unknown tags? The default value is true
.setRootTags()
method by default body
and div
are set as root tags. This affects the margin calculations.setCssAppliers()
method allows you to set a custom CssAppliers class which in it's turn allows you to create custom css appliers.In previous examples, we've also used the setTagFactory()
method.
We can completely change the way HtmlPipeline
interprets tags by creating a custom TagProcessorFactory
.
XMLWorker
creates Tag
objects that contains attributes, styles and a hierarchy (one parent, zero or more children).
HtmlPipeline
transforms these Tag
s into com.itextpdf.text.Element
objects with the help of TagProcessor
s.
You can find a series of precanned TagProcessor
implementations in the com.itextpdf.tool.xml.html
package.
The default TagProcessorFactory
can be obtained from the Tags
class, using the getHtmlTagProcessorFactory()
method.
Not all tags are enabled by default. Some tags are linked to the DummyTagProcessor
(a processor that doesn't do anything), other tags result in a TagProcessor
with a very specific implementation.
You can extend the HtmlPipeline
by adding your own TagProcessor
implementations to the TagProcessorFactory
with
the addProcessor()
method. This will either replace the default functionality of already supported tags,
or add functionality for new tags.
Suppose that you have HTML code in which you've used a custom tag that should trigger a call to a database, for example a <userdata> tag.
XMLWorker
will detect this tag and pass it to the HtmlPipeline
.
As a result, HtmlPipeline
looks for the appropriate TagProcessor
in its HtmlPipelineContext
.
You can implement the TagProcessor
interface or extend the AbstractTagProcessor
class
in such a way that it performs a database query, adding its ResultSet
to the Document
in the form of a (list of) Element
object(s). You should prefer extending AbstractTagProcessor
,
as this class comes with precanned page-break-before
, page-break-after
, and fontsize
handling.
Note that your TagProcessor
can use CSS if you introduced a CssResolverPipeline
before each pipeline that wants to apply styles.
The CssResolverPipeline
is responsible for setting the right CSS properties on each tag.
This pipeline requires a CSSResolver
that contains your css file.
Let's take a look at the StyleAttrCssResolver
that is shipped with XML Worker.