XSLT push and pull models

Push vs Pull

1. Push vs Pull
2. Push vs Pull

1.

Push vs Pull

Wendell Piez



>I have learned on this list that matching is almost always better than 
>selecting.

Well, actually I'd say the two go together, and each needs to be used in light of what you're doing with the other. (For example, remember that apply-templates instructions can also select nodes from the tree, and this is actually quite useful and important.)

Getting the hang of how this happens between templates and how, therefore, templates work to "steer" the input tree into the output, is basic to mastering XSLT.

As so often, there are good reasons for doing either, but those reasons often don't apply when newbies do one thing or another for no particular reason at all, other than that they're rattling the code until they happen to get something to work. This is certainly fair, as far as it goes (I too learn interfaces by poking at them) -- but when you want to go further ... it helps to know "why".

>So if you find that your script cares about some text() elements and 
>not others, then you probably do not want to use the ignore text() 
>elements template because it then forces your script to use a select to 
>get the desired text() element.

Pretty much, yes, subject to certain refinements. I might use an analogy and suggest that it's like setting your spam filter to throw everything away except what you tell it to (whitelisting), when you could more easily tell it just what to throw away (blacklisting). Sometimes whitelisting is in fact a better approach (and this is like XSLT "pulling" of values from the source: nothing gets in but what you ask for). But in most cases (at least in XSLT) it's simpler and easier just to let things through except for just those things you don't want. Then you're not caught by surprise because something you wanted, but neglected to ask for (for whatever reason), fails to appear.

Since text nodes are by definition "leaf" nodes in the data model (they have no children), the practical differences in this particular case only emerge when things get complex -- but typically, at least on loosely-structured data such as most "documentary" data, that happens pretty soon; and because the complexity can be in the source data, the stylesheet itself doesn't have to get very complex for things to go awry. But managing exactly this kind of complexity is what the XSLT processing model is really good at, so there's rarely a good reason to work against it.

Sometimes XSLT newbies try using <xsl:template match="*"/> (suppress all elements by default) in a similar way to solve such "problems" as are introduced by an over-quick reliance on xsl:value-of and such constructs, instead of on the default processing. This can really cause havoc.

>Is there a big difference between select="./text()" and select="." In 
>the examples below? How does this impact performance and scalability?
>
>Example:
><doc>
>         unwanted text
>         <element>desired text</element> </doc>
>
><xsl:template match="text()"/>
><xsl:template match="element">
>         <xsl:value-of select="./text()"/> </xsl:template>
>
>vs.
>
><xsl:template match="element">
>         <xsl:apply-templates />
></xsl:template>
><xsl:template match="element/text()">
>         <xsl:value-of select="."/>
></xsl:template>

As far as efficiency and performance of processing, I doubt there's much significant difference between these. But I don't much care, either, since method #2 is clearly, to my eye, preferable and will scale better. Using method #2 I don't have to write explicit instructions for every other kind of text node I want, whereas in method #1, every new text node I want gives me work to do, to override my override.

But the really interesting thing here is that the templates you've offered in method #2 are, in fact, perfect echoes of what would happen to the templates that would apply to those same nodes if you provided no templates at all. Because the built-in templates

<xsl:template match="*">
   <xsl:apply-templates />
</xsl:template>

<xsl:template match="text()">
   <xsl:value-of select="."/>
</xsl:template>

will do the same thing with the element element and its child text node as the templates above, in #2 ... this means you could leave those templates out and get exactly the same result.

In other words, you don't need to do #2 because it's what the processor will already do without asking.

Doing nothing at all is both reasonably efficient (just let the processor do its thing) and really easy to maintain.

Finally, a minor nit:

select="./text()" is short for
select="self::node()/child::text()"

this amounts to exactly the same thing as

select="child::text()", which is long for select="text()".

So you can say

select="text()"

(leaving off the first step in the path), and things will be fine.

2.

Push vs Pull

Clark C. Evans

Following is a collection of my observations regarding XSLT template mechanism. I hope it may be interesting.

XSLT's Template Dispatch

by Clark C. Evans December 1, 2000

XSLT is a language for transforming XML texts. The language includes familiar constructs such as for-each iteration, conditional statements, and callable functions. XSLT also includes an additional control structure, the template dispatch, which occurs through the interaction of apply-template/select and template/match.

This paper presents two templates, one of which uses the template dispatch, and a second, functionally equivalent to the first, which is constructed without the help of this control structure. This paper then concludes with a comparison which may help to elucidate the power and elegance of XSLT's template dispatch.

Consider the XML input

<bookcase>
  <book>
    <title>The C Programming Language</title>
    <author>Brian W. Kernighan</author>
    <author>Dennis M. Richie</author>
  </book>
  <book>
    <title>Compilers: Principles, Techniques, and Tools</title>
    <author>Alfred V. Aho</author>
    <author>Ravi Sethi</author>
    <author>Jeffrey D. Ullman</author>
  </book>
</bookcase>

processed by the XSLT stylesheet

<stylesheet
  xmlns="http://www.w3.org/1999/XSL/Transform"
  version="1.0">
  <template match="author">
        Author: <apply-templates/>
  </template>
  <template match="book">
      Book: <apply-templates select="title|author" />
  </template>
</stylesheet>

to produce the following output:

    Book: The C Programming Language
      Author: Brian W. Kernighan
      Author: Dennis M. Richie

    Book: Compilers: Principles, Techniques, and Tools
      Author: Alfred V. Aho
      Author: Ravi Sethi
      Author: Jeffrey D. Ullman

Given any input text, the following stylesheet will produce exactly the same output as the stylesheet above, only it will do so without the aid of apply-template/select and apply-template/match. As a consequence, much of its code necessarily emulates functionality required by the XSLT specification and built into a compliant XSLT processor.

<stylesheet
  xmlns="http://www.w3.org/1999/XSL/Transform"
  version="1.0">
  <template match="/">
    <call-template name="dispatch"/>
  </template>
  <template name="dispatch">
    <variable name="id" select="generate-id(.)" />
    <choose>
      <when test="//author[generate-id(.) = $id]">
        Author: <call-template name="apply" />
      </when>
      <when test="//book[generate-id(.) = $id]">
      Book: <call-template name="apply" >
              <with-param name="select" select="title|author" />
            </call-template>
      </when>
      <when test="self::text()">
        <value-of select="." />
      </when>
      <otherwise>
        <call-template name="apply" />
      </otherwise>
    </choose>
  </template>  
  <template name="apply">
    <param name="select" select="node()" />
    <for-each select="$select">
    <call-template name="dispatch" />
    </for-each>
  </template>
</stylesheet>

The entry point for a procedural stylesheet is marked by <template match="/">, a special case similar to a "C" style main() function. When this template is executed, the current node for the process is initialized to root node, and then the body, <call-template name="dispatch" />, transfers control to the template named dispatch.

The dispatch template begins by creating a variable, $id, which is used to hold an unique string identifier generated by the XSLT processor for the current node. Following is a conditional switch statement having three when clauses and a single otherwise. For each when clause, a path expression is evaluated and converted into a boolean value. If true, then the corresponding body is executed and control resumes immediately after the choose construct ends. If all of the when clauses fail to fire, then the body of the otherwise is executed.

The first case, <when test="//author[generate-id(.)=$id]">, has a path expression returning the node-set consisting of any author element having an identifier equal to the current node's identifier. Therefore, if the current node happens to be an author, the node-set returned will be non-empty, which converts to a true boolean value. This body is executed with two operations: the non-empty text node "\n Author: " is printed and then control is transferred to the apply template as specified by <call-template name="apply"/>.

The second case, <when test="//book[generate-id(.)=$id]">, is very similar. Only here, the apply template is called with a parameter named select. The select parameter is the node-set containing all element children of the current node with a name of either title or author.

The third case, <when test="self::text()">, only fires when the current node is a text node. Here <value-of select="."> instructs the processor to print the text value of the current node. Finally, in default, the apply template is called when none of the previous when clauses have executed.

The last template in the procedural stylesheet, apply, has an optional parameter select, which is passed as a node-set. If this parameter is missing, then every child of the current node is selected. The remainder of this function then iterates through the selected node-set, calling the dispatch template.

A comparison of the two stylesheets reveals that the first when clause described above corresponds directly to the first template of the original stylesheet. In a similar manner, the second when clause corresponds to the second template of the original stylesheet. The third when and the default otherwise clause are needed to emulate the built-in-rule required by the XSLT specification. A fully compliant emulation would also order the tests according to XSLT's priority rules.

As you can see with this emulation, a good amount of code is built into the XSLT processor. Specifically, functionality similar to the apply template, a default entry point, and a mechanism similar to the dispatch template are included. Furthermore, the complexity of this dispatch mechanism increases when the import statement and mode attribute are considered. The administration of these and other details is handled by the XSLT processor, allowing succinct stylesheets like the first one presented.

Examination of this procedural stylesheet also clarifies the complementary roles played by select and match expressions. The select expression chooses which nodes to visit, and, for each node, the match expressions designate which template to execute. This allows an ordered set to be selected as a whole, yet each node in the set to be treated individually. This decoupling of roles is elegantly managed by the XSLT processor and invoked by apply-template/select and template/match.

The decoupling of select and match also allows a processor to pre-compute the match expressions up-front. For example, given an in-memory node-based implementation, an additional pointer could be added to each node. After the tree is loaded and before processing commences, the processor could visit each node in the tree, filling in a pointer to the template which best matches the node according to the priority rules. Then, while iterating through a selected node-set, the template to dispatch is immediately available without further computation.

XSLT's template mechanism may not be the best solution to every transformation requirement; however, when the inputs have varying structure such that relative order among nodes with different matching criteria is important, its template dispatch approach is a clear winner.