Xinclude

1. xslt and xinclude
2. XInclude processing

1.

xslt and xinclude

Bob Stayton

Bob Stayton, as part of his very thorough examination of XSLT in docbook, has documented using xInclude with XSLT. (Ednote: the rest of the book is very good too)

2.

XInclude processing

Eliot Kimber




> I want to convert Docbook to fo and then to pdf.
> The docbook uses xi:include XInclude.
> I tried to convert with java org.apache.xalan.xslt.Process and the 
> xslt from http://docbook.sourceforge.net/ ( 
> docbook-xsl-1.65.1/fo/docbook.xsl ) But the includes are not 
> dereferenced and the document is empty, because xalan can not handle 
> XIncludes.
> I have to use java and not C ( http://xmlsoft.org/XSLT/ xsltproc ) 
> which works with XIncludes.
> So fas as I know xalan 2.4 is build in Java 1.4.
> So can I update to newer xalan or has it nothing to do with xalan?
> (or have I to update xerces or can I use saxon?)

If you go to Dave Pawson's FAQ and search on "xinclude" (or just Google XInclude) you'll find a pure-XSLT/EXSLT XInclude implementation that I use in production. It implements a two-pass process where all the XIncludes are resolved, all IDs and references to them are rewritten to ensure correctness and uniqueness, and then the resulting result tree is processed normally. It requires a small amount of customization to implement the ID/IDREF rewriting for a particular DTD, but it's not hard to do, just a few lines of code. I haven't done it for DocBook myself.

I haven't tried the XInclude support in Xerces yet as I don't feel that simply resolving XIncludes without also doing ID and reference rewriting is useful (I discuss this in some detail in the paper I presented at XML 2004, unfortunately, that paper isn't yet on the Web as far as I know--I'll see if I can get it on the Innodata site before too long). I'm also in the process of implementing a demonstration toy XInclude-aware content management system and I'm implementing my own XInclude processing for import because you need to be aware of the XIncludes but not actually do the transclusion in this context. I have submitted a paper on this system to an uncoming conference. As part of this effort I will have implemented a generic XInclude support library that can be used outside of a parsing context (and in particular, within an XSLT processing context by way of extension functions). I've already got some basic test cases working (resolution of plain XPaths within xpointer() scheme parts without namespace awareness).

That is, I feel very strongly that XInclude processing *should not* be done as part of the base parsing process but should be done as part of the processing of the parsed documents involved, just as you would process any other hyperlinks (XIncludes being just a specialized form of semantic hyperlink). In other words, thinking of XIncludes as just a direct replacement for external parsed entities is missing both the point of why external parsed entities are bad and why semantic-level processing of use-by-reference is good.

This is not to disparrage the work done in tools like Xerces to implement XInclude and support it--there are a number of use cases where that functionality will be useful and it may be that that functionality can be used as I think it should be. And of course, not everyone agrees with me on how XInclude should be used.

However, at least in the the current Xerces implementation, it can only resolve includes of entire documents, not individual elements, and since most of my interesting use cases are the inclusion of individual elements from larger documents I can't use that funcationlity in any case. However, the work I'm doing on my toy content management system will involve creatin of a fairly complete XPointer implementation that should make it easy to add full XInclude support to any Java system that wants to add it.

I asked Eliot,

> I Wonder why duplicate id's after transclusion hasn't been addressed 
> by xslt 2.0?

I think it is addressed in the current WD but the wording is somewhat ambiguous. Here's what the Nov 10 spec says:

4.5.3 references property fixup

During inclusion, an attribute information item whose attribute type property is IDREF or IDREFS has a references property with zero or more element values from the source or included infosets. These values must be adjusted to correspond to element values that occur in the result infoset. ---

That is, the references in the transcluded document's infoset must point to the copies of the elements they originally pointed to before transclusion. At the infoset level this isn't really ID rewriting because you're just moving a pointer--the actual ID string value doesn't have to change.

But in a practical implementation that is not operating at the infoset level, this can only mean that IDs must be rewritten so the the infoset resulting from that document would satisfy this requirement.

But since the spec is defined only in terms of infoset, and not in terms of transformation from instance to instance, it really leaves open the issue what implementations actually do.

But my main point is that in a re-use scenario requiring IDs to be unique across all documents involved is burdensome, counterproductive, impossible to ensure, and unnecessary in any case (because rewriting is simple).