XSLT and XInclude

1. XInclude
2. XSLT 1.1 and xinclude security issues
3. Xinclude functionality

1.

XInclude

Paul Grosso


> Do you have to strip the XML Declaration and the DocType Declaration
> from an XML file before letting it be included? Or can you leave them
> on? What happens if the file you include is not syntactically valid wrt
> the DTD of the file you're including it in?
	

No, XInclude takes care of that for you.

Right. This should be explained at: W3C

>what happens when you include a TEI Lite file 
>inside a DocBook file? The two have incompatible markup.

You're still thinking in terms of parsing a document with external entities against a DTD. XML--along with its concept of well-formedness and its companion specs such as Infoset, XInclude, XSLT, XML Schema etc.--has, for better or worse, taken us beyond that.

XInclude is defined in terms of merging infosets. Even well-formed but not valid documents have infosets. So, inclusion is really orthogonal to validation.

Put another way, you can think of inclusion as:

1. parse the parent document to create an infoset.

2. run over the infoset with an XInclude-aware processor and, for each xinclude element, run off to the referenced document, parse it to create an infoset, and

3. insert the to-be-included infoset (removing decls at the top and deleting the xinclude element itself).

4. recurse as necessary.

Now you've created a new infoset. If you want, you can run some validation pass on that infoset if that makes sense given your application, but you needn't. You still have a well-defined infoset, regardless of whether you can validate it against any of the DTDs against which the original bits were parsed.

The XInclude-aware processor must, of necessity, include an XML parser, and what I'm saying is that you might prefer that that parser be a validating one, because that way you would catch any erroneously tweeked entities right at the point of attempted inclusion.

Jeni T queries:

But thinking about it, isn't there an interesting issue with XIncludes in XSLT? If an XSLT processor is presented with:

<xsl:template match="/">
   <foo>
      <xinclude:include href="foo.xml" />
   </foo>
</xsl:template>

How does the processor know whether the xinclude:include should be activated during the parse of the XSLT stylesheet or whether the include element should be copied to the result tree?

Uche Ogbuji answers:

This is defined by the processor itself.

The XInclude spec defines a way to express inclusions, but makes no prescriptions of the semantics of such inclusions. An XSLT processor can choose to expand the include or pass it on, and in both cases be conformant to XInclude and XSLT.

Yes, this could be problematic for interoperability.

From the sounds of XInclude, I guess that processing the include elements is a pretty low-level feature of an XML processor, and that the stylesheet's DOM will include the external file.

Uche Ogbuji answers:

Not necessarily. The only note that is made is that it is at a higher level than parsing. But other than that, it could be at a level below DOM, between DOM and XSLT, or above XSLT.

Which means that to *create* an include element in the result, you'd have to use the xsl:element instruction *or* use a namespace alias. In other words, elements and attributes that are dealt with by low-level XML processors have to be generated with the same techniques as XSLT elements and attributes. As XLink becomes supported as well, presumably we'll run into the same issues there.

Uche Ogbuji answers:

I think you pretty much bring up an undefined crease between the specifications. There are many others.

This is what we get until the XML core specs settle down. (Feb 2001)

Daniel Veillard adds:

Well, not that simple, sometimes you just don't want to define in a rigid way how processing have to be done, especially when all this are actually new sections of a tool box. It sometimes does make sense to process first A then B, and in other cases B then A, B and A being in the pool:

- XLink
- Schemas validation
- XSLT
- XInclude
- XML Canonicalization
- etc...

Even when existing spec defines the processing rigidly like DtD validation well the toolkit sometimes need more flexibility (people do want to validate a parsed and modified document without necessarilly going back to serialization).

2.

XSLT 1.1 and xinclude security issues

Steve Muench

 > Just imagine an XSL stylesheet like this one:
 > 
 > http://www.guninski.com/ora.xsl
 > 
 > which illustrates using Java extension functions in
 > the Oracle XSLT processor. If your users can pass in
 > arbitrary URL's to an XSLT stylesheet and get your
 > web server to execute the stylesheet, then effectively
 > any Java function in your server's classpath (at a minimum
 > all of the JDK classes) can be exploited by a malicious
 > stylesheet to create files on the server or do other
 > damage.
	

Our Oracle XML/XSLT-publishing servlet now implements several checks to prevent stuff like this.

We cannot ascertain whether a particular stylesheet is trusted, but we can make sure that the engine only will process stylesheets which are from the local machine's virtual path (meaning the web master or someone trusted by the webmaster had to have put it there), or from a URL including a host that is in a trusted-hosts lists, defined in the servlet's configuration file.

3.

Xinclude functionality

Eliot Kimber

This is a basic implementation of XInclude that includes support for DTD-specific ID and reference rewriting so that references to elements in an included document will be correctly resolved in the transcluded result. This is for addition to the FAQ.

The package contents are:

o xinclude.xsl -- Implements the base XInclude semantics. Produces an in-memory transcluded result that can then be processed by normal templates (as demonstrated in xinclude-test.xsl style sheet)
o xpointer-functions.xsl -- Implements XPointer (should be same as what's already in the FAQ but I haven't verfied that)
o xinclude-test.xsl -- Demonstrates how to use xinclude.xsl to create an intermediate, in-memory transcluded result and how to write a DTD-specific version of the xinclude-copy-attributes template.
o xinclude-test-01.xml -- Top-level document used as a test case. Includes subdoc-01.xml and has two links, one within itself and one to subdoc-01.xml
o subdoc-01.xml -- Subdocument included by xinclude-test-01.xml

The style sheet requires that EXSLT function and common be implemented by the XSLT processor (e.g., Saxon).

This works as follows:

The xinclude.xsl style sheet defines a mode, "xinclude", which is basically an identity transform except that any xi:include elements are resolved to their targets, which are then copied to the output. For attribute processing, the named template "xinclude-copy-attributes" copies each element's attributes.

By default xinclude-copy-attributes is just an identity transform. However, style sheets that include xinclude.xsl can implement their own version of xinclude-copy-attributes to do ID and reference rewriting so that references in the documents as authored will continue to work in the transcluded result document. The xinclude-test.xsl template demonstrates how this can be done. This makes it easy for a DTD-specific style sheet to accomodate the specific referential mechanism used in that document type.

Note that this implementation may or may not be a fully conforming XInclude processor. The XInclude spec seems to be somewhat ambiguous about whether or not ID rewriting is technically allowed, although we read it as allowing it. However, we find it impossible to use XInclude in practice without doing this rewriting for the following reasons:

- Without ID rewriting, a given XML element that is the target of a reference could only be included once in a given compound document, otherwise the target of the resulting reference would be ambiguous (and the transcluded result would not be DTD-valid because the IDs would be repeated, assuming the attributes are declared to be of type ID).

- It would be necessary to manage a global ID name space across all documents.

This implementation approach solves both problems.

Note that the ID rewriting implemented in xinclude-test.xsl will only work for references to documents processed as part of the same XSLT process (that is, components of the same compound document). However, it would be possible to enable references to components of other compound documents given a more involved addressing scheme. However, we have not yet worked out what the best syntax for that addressing scheme would be (although we know what the requirements for it are and how to implement its resolution).

File: subdoc-01.xml
<?xml version="1.0"?>
<subdoc>
 <element id="id-02" att1="foo">This is in subdoc-01</element>
 <element id="id-03" att1="bar">This is also in subdoc-01</element>
</subdoc>
File: xinclude-test-01.xml
<?xml version="1.0"?>
<xinclude-test xmlns:xi="http://www.w3.org/2001/XInclude">
  <element id="id-01">This is a normal element</element>
  <element id="id-02">This is a crossref to element id-01: 
  <CrossRef refsub="id-01"/></element>
  <element id="id-03">This is a crossref to element id-02 in subdoc-01: 
    <CrossRef refsub="id-02" refdoc="subdoc-01.xml"/></element>
  <xi:include href="subdoc-01.xml"/>
</xinclude-test>
File: xinclude.xsl
<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
  xmlns:xlink="http://www.w3.org/1999/xlink"
  xmlns:xi="http://www.w3.org/2001/XInclude"
  xmlns:xptr="http://www.w3.org/2001/05/XPointer"
  xmlns:xptrf="http://www.isogen.com/functions/xpointer"
  xmlns:func="http://exslt.org/functions"
  xmlns:fcommon="http://exslt.org/common"
  extension-element-prefixes="func xptr"
  version="1.0">
  <xsl:import href="xpointer-functions.xsl"/>

  <xsl:param name="doctype-system"></xsl:param>
  <xsl:param name="doctype-public"></xsl:param>

  <xsl:template name="xinclude-copy-attributes">
    <!-- Implement this template in the main template to 
         specialize to a particular document type.
      -->
    <xsl:for-each select="./@*">
      <xsl:copy-of select="."/>
    </xsl:for-each>
  </xsl:template>

  <xsl:output 
    method="xml" 
    indent="yes"
    doctype-system="foo"
    omit-xml-declaration="no" 
    encoding="UTF-8"/>
  
<!-- ==============================================================
     XInclude implementation

     Implements XInclude by processing the entire doc
     to produce a single result tree with all the includes resolved
     and then applies the normal template processing to that document.
     ==============================================================-->
<xsl:template match="/">
 <xsl:choose>
   <xsl:when test="//xi:include">
     <xsl:variable name="resolved-doc">
       <xsl:apply-templates  mode="xinclude"/>
     </xsl:variable>
     <xsl:apply-templates select="$resolved-doc" mode="normal"/>
   </xsl:when>
   <xsl:otherwise>
     <xsl:apply-templates/>
   </xsl:otherwise>
 </xsl:choose>
</xsl:template>

<xsl:template match="/" mode="normal">
  <xsl:apply-templates/>
</xsl:template>

<xsl:template match="node()" mode="xinclude">
  <xsl:copy
    ><xsl:call-template name="xinclude-copy-attributes"
    /><xsl:apply-templates select="node()" mode="xinclude" 
  /></xsl:copy>
</xsl:template>

<xsl:template match="xi:include" mode="xinclude">
  <xsl:variable name="xpath" select="@href"/>
  <xsl:choose>
    <xsl:when test="$xpath != ''">
      <xsl:message>Including <xsl:value-of 
	  select="$xpath"/></xsl:message>
      <xsl:apply-templates 
	  select="xptrf:resolve-xpointer-url(.)" mode="xinclude"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:message>Xinclude: Failed to get a value for the href= attribute 
      of xi:include element.</xsl:message>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>
  
</xsl:stylesheet>
	

file: xpointer-functions.xsl

<?xml version='1.0'?>
<!-- Copyright (c) 2002 ISOGEN International 

     This style sheet fragment implements the resolution of xpointers.
     It requires implementation of the exslt common and functions 
     modules.

     Author: W. Eliot Kimber, eliot@isogen.com

     $Revision: 1.1 $
  -->
<!-- Status:

  -->
<xsl:stylesheet version="1.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:xlink="http://www.w3.org/TR/xlink"
  xmlns:xptr="http://www.w3.org/2001/05/XPointer"
  xmlns:xptrf="http://www.isogen.com/functions/xpointer"
  xmlns:saxon="http://icl.com/saxon"
  xmlns:func="http://exslt.org/functions"
  xmlns:fcommon="http://exslt.org/common"
  extension-element-prefixes="func xptr"
>

<func:function name="xptrf:resolve-xpointer-url">
  <!-- Given an element that exhibits an href attribute,
       attempts to resolve the URL and XPointer (if present)
       into a result node list.

       If there is no fragment identifier, acts as though
       the fragment identifier "#/" had been specified,
       returning the document root.
    -->
  <xsl:param name="pointer-node"/>
  <!-- The Element node that exhibits the XPointer to be resolved -->
  <xsl:variable name="href" select="$pointer-node/@href"/>
  <xsl:choose>
    <xsl:when test="starts-with($href,'#')">
      <xsl:variable name="fragid">
        <xsl:value-of select="substring($href, 2)"/>
      </xsl:variable>
      <xsl:variable name="xpointer" select="xptrf:fragid2xpointer($fragid)"/>
      <!-- NOTE: error checking and reporting is done by resolve-xpointer -->
      <xsl:variable name="rns" 
      select="xptrf:resolve-xpointer($pointer-node, $xpointer)"/>
      <xsl:choose>
        <xsl:when test="string($rns) = ''">
          <func:result select="/.."/>
        </xsl:when>
        <xsl:when test="fcommon:object-type($rns) != 'node-set'">
          <func:result select="/.."/>
        </xsl:when>
        <xsl:otherwise>
          <func:result select="$rns"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="url">
        <xsl:variable name="cand-url" 
        select="substring-before($pointer-node/@href, '#')"/>
        <xsl:choose>
          <xsl:when test="$cand-url = ''">
            <xsl:value-of select="$href"/>
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="$cand-url"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:variable>
      <xsl:variable name="cand-xpointer">
        <xsl:value-of select="substring-after($href, '#')"/>
      </xsl:variable>
      <xsl:variable name="xpointer">
        <xsl:choose>
          <xsl:when test="$cand-xpointer = ''">
            <xsl:value-of select="string('/')"/>
            <!-- Return the document element of the target document -->        
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="xptrf:fragid2xpointer($cand-xpointer)"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:variable>
      <xsl:variable name="location-source-node" 
       select="document($url, $pointer-node)"/>
      <xsl:variable name="rns" 
       select="xptrf:resolve-xpointer($location-source-node, 
                                           $xpointer)"/>
      <xsl:choose>
        <xsl:when test="string($rns) = ''">
          <func:result select="/.."/>
        </xsl:when>
        <xsl:when test="fcommon:object-type($rns) != 'node-set'">
          <func:result select="/.."/>
        </xsl:when>
        <xsl:otherwise>
          <func:result select="$rns"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>
  </xsl:choose>

</func:function>

<func:function name="xptrf:resolve-xpointer">
  <!-- Resolves an xpointer in the context of some location source node.

       The location source is either the pointer, if the URL was just
       an XPointer, or it's the document element of the document addressed by the 
       resource part of the URL.
    -->
  <xsl:param name="location-source-node"/>
  <xsl:param name="xpointer"/>
  <xsl:for-each select="$location-source-node">
    <!-- Setting the context to the pointer node so that relative URLs are resolved
         relative to the pointer node by saxon:evaluate() -->
    <xsl:choose>
      <xsl:when test="$xpointer != ''">     
        <xsl:variable name="direct-result-set" select="saxon:evaluate($xpointer)"/>
        <xsl:choose>
          <xsl:when test="string($direct-result-set) = ''">
            <xsl:message>XIndirect warning: XPointer "<xsl:value-of 
            select="$xpointer"/>" did not address any nodes.</xsl:message>
          </xsl:when>
          <xsl:when test="fcommon:object-type($direct-result-set) != 'node-set'">
            <xsl:message>XIndirect warning: XPointer "<xsl:value-of 
             select="$xpointer"/>" did not address any nodes.</xsl:message>
            <func:result select="/.."/>
          </xsl:when>
          <xsl:otherwise>
            <func:result select="$direct-result-set"/>
          </xsl:otherwise>
        </xsl:choose>
      </xsl:when>
      <xsl:otherwise>
        <xsl:message>XPointer error: $xpointer value is '' in resolve-xpointer.
        </xsl:message>
        <func:result select="/.."/>
      </xsl:otherwise>
    </xsl:choose>  
  </xsl:for-each>

</func:function>

<func:function name="xptrf:fragid2xpointer">
  <xsl:param name="fragid"/>
  <!-- Given a fragment identifier string, attempts to interpret it as an XPointer. -->
  <!-- NOTE: does not:
       - Handle multi-part XPointers: "#xpointer(foo)xpointer(bar)"
       - Skip non-xpointer schemes

       Doing this would require more sophisticated string processing than I can
       reasonably do in XSLT.
    -->
  <xsl:choose>
    <xsl:when test="starts-with($fragid, 'xpointer(')">
      <xsl:variable name="first-part" select="substring-after($fragid, 'xpointer(')"/>
      <xsl:variable name="len" select="(string-length($first-part) - 1)"/>
      <xsl:variable name="xpointer" select="substring($first-part,1,$len)"/>
      <func:result select="$xpointer"/>
    </xsl:when>
    <xsl:when test="not(contains($fragid, '/')) and
                    not(contains($fragid, '[')) and
                    not(contains($fragid, '*')) and
                    not(contains($fragid, '@'))">
      <!-- Probably a bare name -->
      <func:result select="concat('id(', $fragid, ')')"/>
    </xsl:when>
    <xsl:when test="contains($fragid, '/') and
                    not(contains($fragid, '[')) and
                    not(contains($fragid, '*')) and
                    not(contains($fragid, '@'))">
      <!-- Probably a child sequence -->
      <xsl:variable name="barename" select="substring-before($fragid, '/')"/>
      <xsl:choose>
        <xsl:when test="$barename = '' and
                        contains(translate($fragid, 
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
'^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^'),
                                 '^')">
          <func:result select="xptrf:xpointer-error($fragid)"/>
        </xsl:when>
        <xsl:when test="$barename = ''">
          <xsl:message>fragid='<xsl:value-of select="$fragid"/>'</xsl:message>
          <xsl:variable name="childseq" 
              select="xptrf:build-child-sequence($fragid)"/>
          <func:result select="$childseq"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:variable name="idref" select="concat('id(', $barename, ')')"/>
          <xsl:variable name="xpointer-childseq" 
              select="substring($fragid, (string-length($barename) + 1))"/>
          <xsl:choose>
            <xsl:when test="contains(
translate($xpointer-childseq, 
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ',
'^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^'),
                                     '^')">
             <func:result select="xptrf:xpointer-error($fragid)"/>
            </xsl:when>
            <xsl:otherwise>
              <func:result select="concat('id(', $barename, ')', 
                         xptrf:build-child-sequence($xpointer-childseq))"/>
            </xsl:otherwise>
          </xsl:choose>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>
    <xsl:otherwise>
      <func:result select="xptrf:xpointer-error($fragid)"/>
    </xsl:otherwise>
  </xsl:choose>
</func:function>

<func:function name="xptrf:build-child-sequence">
  <xsl:param name="xptr-childseq"/>
  <xsl:choose>
    <xsl:when test="not(starts-with($xptr-childseq, '/'))">
      <func:result select="xptrf:xpointer-error($xptr-childseq)"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="temp" select="substring($xptr-childseq, 2)"/>
      <!-- strip leading "/" -->
      <func:result select="xptrf:construct-child-sequence($temp)"/>
    </xsl:otherwise>
  </xsl:choose>
</func:function>

<func:function name="xptrf:construct-child-sequence">
  <xsl:param name="xptr-child-seq"/>
  <xsl:param name="xpath-child-seq"/>
  <xsl:variable name="child-num">
    <xsl:choose>
      <xsl:when test="contains($xptr-child-seq, '/')">
        <xsl:value-of select="concat('/*[', 
        substring-before($xptr-child-seq, '/'), ']')"/>
      </xsl:when>
       
      <xsl:otherwise>
        <xsl:value-of select="$xptr-child-seq"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
  <xsl:variable name="rest" select="substring-after($xptr-child-seq, '/')"/>
  <xsl:choose>
    <xsl:when test="$rest = ''">
      <func:result select="$xpath-child-seq"/>
    </xsl:when>
    <xsl:otherwise>
      <func:result select="xptrf:construct-child-sequence(
      $rest, concat($xpath-child-seq, $child-num))"/>
    </xsl:otherwise>
  </xsl:choose>
</func:function>

<func:function name="xptrf:xpointer-error">
  <!-- Reports an XPointer error and returns "/.." -->
  <xsl:param name="fragid"/>
  <xsl:message
>XPointer error: fragment identifier "<xsl:value-of 
select="$fragid"/>" is not a valid XPointer.
                Returning "/.." as XPath to resolve (empty node set)</xsl:message>
  <func:result select="concat('/', '..')"/>
</func:function>
</xsl:stylesheet>