Differences, 1.0 to 2.0

1. XSLT 2.0 examples
2. Forward and backwards compatibility
3. 1.0 and 2.0 differences
4. Wildcards in namespace
5. Boolean tests in 2.0
6. xsl expressions return type
7. Is XSLT 2.0 Turing complete
8. Copy, without namespaces
9. Start processing at a named template
10. Wrong version

1.

XSLT 2.0 examples

Abel Braaksma

During the ongoing discussion about XSLT 2 today and considering/not considering the switch, I thought: why isn't there a nice all-encompassing example that shows the merits of XSLT 2 over XSLT 1? It'll come as no surprise that there's no trivial answer to that, so I figured: why not ask the masses (some people claim that the masses are always right, though that is debatable).

The target audience is: XSLT 1 users that would like to know more / understand more of XSLT 2 and are considering the switch.

Before we start discussing the template itself, let me throw in some rules:

1. The template must be 100% compliant to the REC
2. There can be no extensions used of whatever kind
3. The template must be rather simple, easy, explainable.
4. It must be run on itself (i.e., no hassle with input files etc)
5. It must show some key concepts, without showing too much detail
6. Does not use functions that call or write resources (xsl:result-document, unparsed-text etc)
7. (optional) it should be runnable on the available xslt 2 processors

Apart from (6) and (7), I think it is fairly trivial. I put in (7) so that people that want to see other tools in action (Saxon, Gestalt, Altova) can see them in action (it follows that this means no SA behavior, sorry). I put in (6) to keep the example rather trivial. If users of mentioned and/or missing tools would be so kind as to answer with a one-liner to call the template below from a command line?

About the details that should be in the example. I thought of a nice example that can possibly be made infinitely better. Consider it a first draft, and I invite everyone to shoot at it (you may even blast it away ;)

This what I put in so far:

a) badly designed xml as input (in a variable, see rule 4)
b) a micro pipeline
c) for-each-group based on value to show groups without Muenchian
d) xsl:function for camelcasing a string
e) ranges, like "2 to 3" and comparison
f) tokenize() and matches()
g) for ... in .... return
h) next-match
i) the use of the 'as' attribute on basic processors

The example is rather trivial (it should be, I believe). It takes a list of users of XSLT products, and groups them per product:

 <james-johnsson>Saxon, c, xslt 2</james-johnsson>
 <super-troopers>xsltproc, nc, xslt 1</super-troopers>

Here: the node name is the user. Then follows a CSV string. The first part is the processor, the second says Compliant or NonCompliant, the third says the language (I am pretty sure the input is not correct, sorry about my lack of knowledge of the compliancy level).

The output groups per processor as follows, where some processing is done on the strings and the users are comma-concatenated under <users>:

<processor name="Xsltproc">
  <level>processor is non-compliant</level>
  <language>XSLT 1</language>
  <users>George Williams Geraldson, Super Troopers</users>
</processor>

I understand this is a rather superficial example. If anybody can make it better, clearer, input is welcome. Keep in mind that it should be kept easy as well as showing the power of XSLT 2 (so it can be used as a showcase).

Here's the XSLT so far, any ideas, complaints, suggestions, rewrites, opinions etc are welcome:

<xsl:stylesheet
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
   xmlns:xs = "http://www.w3.org/2001/XMLSchema";
   xmlns:my = "urn:my"
   version="2.0"
   exclude-result-prefixes="#all">

   <xsl:output indent="yes" />

   <!--
       the input, just in a variable
       ready to use, easy for testing, no need
       for exslt:node-set()
   -->
   <xsl:variable name="preferences">
       <james-johnsson>Saxon, c, xslt 2</james-johnsson>
       <george-williams-geraldson>xsltproc, nc, xslt
1</george-williams-geraldson>
       <super-troopers>xsltproc, nc, xslt 1</super-troopers>
       <merry-mirriams>libxslt, nc, xslt 1</merry-mirriams>
       <john-ronald-reuel-tolkien>saxon, c, xslt
2</john-ronald-reuel-tolkien>
       <sir-tomald-richards>gestAlt, nc, XSLT 2</sir-tomald-richards>
       <agatha-kirsten>saxon, c, xslt 2</agatha-kirsten>
       <mollie-jollie>saxon, c, xslt 2</mollie-jollie>
   </xsl:variable>

   <xsl:template match="/" name="main">
       <xsl:variable name="micro-pipeline">
           <xsl:apply-templates select="$preferences/*" />
       </xsl:variable>

       <!-- group by processor -->
       <xsl:for-each-group
           select="$micro-pipeline/processor"
           group-by="token[1]/upper-case(text())">

           <processor name="{my:camel-case(token[1])}" >
               <xsl:apply-templates select="token[position() = 2 to 3]" />
               <users>
                   <!--
                       join the users in one string
                       and camel case their names
                   -->
                   <xsl:value-of select="
                       string-join(
                       my:camel-case(current-group()/user)
                       , ', ')" />
               </users>
           </processor>
       </xsl:for-each-group>

   </xsl:template>

   <!--
       matches for $preferences nodes
   -->
   <xsl:template match="*" priority="0">
       <processor>
           <xsl:next-match />
       </processor>
   </xsl:template>

   <xsl:template match="*">
       <user><xsl:value-of select="local-name(.)" /></user>
       <xsl:next-match />
   </xsl:template>


   <xsl:template match="text()">
       <xsl:for-each select="tokenize(., ',')">
           <token><xsl:value-of select="normalize-space(.)" /></token>
       </xsl:for-each>
   </xsl:template>


   <!--
       what follows: matches for micro pipeline
       all matches are case-insensitive, with no
       need for translate() and trouble with more complex
       characters
   -->
   <xsl:template match="token[matches(., '^c$', 'i')]">
       <level>processor is compliant</level>
   </xsl:template>

   <xsl:template match="token[matches(., '^nc$', 'i')]">
       <level>processor is non-compliant</level>
   </xsl:template>

   <xsl:template match="token[matches(., '^xslt', 'i')]">
       <language><xsl:value-of select="upper-case(.)" /></language>
   </xsl:template>

   <!--
       put the nasty bit aside in a function
       it camel-cases a dashed or space delimited string
   -->
   <xsl:function name="my:camel-case" as="xs:string*">
       <xsl:param name="string" as="xs:string*"/>
       <xsl:sequence select="for $s in $string
           return string-join(
               for $word in tokenize($s, '-| ')
               return
                   concat(
                       upper-case(substring($word, 1, 1)),
                       substring($word, 2))
               , ' ')" />
   </xsl:function>
</xsl:stylesheet>

2.

Forward and backwards compatibility

Michael Kay



> from the xslt 2 WD:
> <quote>An element enables forwards-compatible behavior for 
> itself, its attributes, its descendants and their attributes 
> if it has an [xsl:]version attribute (see 3.3 Standard 
> Attributes) whose value is greater than 2.0.</quote>

> I'm having a problem with this. 
> If I write a stylesheet with version="15.0" as an attribute 
> of the root element, just what am I saying?

This text is essentially unchanged from the 1.0 spec. It's actually a brilliant bit of future-proofing.

Suppose that XSLT 3.0 has just been published, and it includes a new <xsl:perform-magic> instruction, which is implemented in Saxon version 19.2, but not yet in MSXML6. You want to invoke this instruction when your stylesheet is running under Saxon, but when running under MSXML6, you just want to leave out that part of the output. So you write:

<xsl:template match="thing" version="3.0">
  <xsl:perform-magic select="magic-dust">
    <xsl:fallback>Sorry, Microsoft don't do magic</xsl:fallback>
  </xsl:perform-magic>
</xsl:template>

Specifying version="3.0" means that the Microsoft processor (or any XSLT 1.0 or 2.0 processor) is obliged to execute the xsl:fallback instruction. If you had said version="1.0" or version="2.0", then the processor would instead have thrown a static error saying that there is no such instruction as xsl:perform-magic.

3.

1.0 and 2.0 differences

Mike Kay



> Given an XPath expression (for example //foo/foo2) that works 
> for a given xml, I would like to use the substring-before 
> function, to keep only the characters in //foo/foo2 before 
> '-'. The XPath would then become substring-before(//foo/foo2, '-')

> Saxon error is "A sequence of more than one item is not 
> allowed here".

The expression substring-before(//foo/foo2, '-') is a valid XPath 1.0 expression.

In XPath 2.0 it is valid only if //foo/foo2 returns a single node. The Saxon error message implies that it is selecting more than one node. If you want to select only the first node, use substring-before((//foo/foo2)[1], '-'). If you want to process all the nodes (in XPath 2.0) use

for $x in //foo/foo2 return substring-before($x, '-')

4.

Wildcards in namespace

Michael Kay


> I'm writing a XSLT stylesheet for processing of UML XMI files. Alas, 
> depending on the UML tool, the elements in the XMI file are placed in 
> a different namespace. E.g. Poseidon (www.gentleware.com) uses the 
> namespace-URI xmlns:UML = "org.omg.xmi.namespace.UML", and Rational 
> Rose uses xmlns:UML = "href://org.omg/UML/1.3". However, all of them 
> generate a prefix UML: for their XMI elements.

> Now my problem is: how can I define a namespace in my XSLT stylesheet 
> so that I can process all XMI files with the SAME stylesheet ? I came 
> across a remark on the XSLT FAQ suggesting that it is possible to use 
> wildcards in the namespace, eg <xsl:stylesheet xmlns:UML="*UML*">.

XPath 2.0 allows constructs of the form *:local which will match local in any namespace, but there's no way of matching a set of namespaces.

I would recommend using a first transformation pass to normalize the namespace URI, to keep this separate from the "real" transformation logic. This is just a variant on the identity template:

<xsl:template match="one-uri:*">
  <xsl:element name="{local-name}" namespace="two-uri">
    <xsl:copy-of select="@*"/>
    <xsl:apply-templates/>
  </xsl:element>
</xsl:template>

5.

Boolean tests in 2.0

Andrew Welsh



 Can I use a boolean variable in an xsl:if test

beware though that that will get you burned again when you start using XSLT2 scented water.

xsl:value-of returns a text node with string value the string value of the expression. This is subtly or not so subtly different from a string. It doesn't make so much difference in XSLT1 as the only way to carry strings around is to put them in text nodes, but in xpath2 you can have sequences of strings and sequences of text nodes (and sequences that contain both strings and text nodes) the rules for the two cases (and in particular whether spaces are automatically inserted between adjacent items) are different on the two cases.

Yes, but I think the 2.0 way makes more sense.

Just to make sure we are talking about the same thing (and to help cement my knowledge), consider:

<root>
  <node>foo</node>
  <node>bar</node>
</root>

In 1.0:

<xsl:template match="root">
  <xsl:value-of select="node"/>
</xsl:template>

Returns:

'foo'

Because in XSLT 1.0 'first item semantics' apply when a value-of is performed on a sequence.

In 2.0 the same template would return:

'foo bar'

That is, all items in the sequence with a single space as a seperator. In order to remove/control the space, we can use the @separator on value-of:

<xsl:value-of select="node" separator=""/>

Which would produce:

'foobar'

For me, that's much more intuitive than just picking the first one. Another plus for 2.0 :)

Of course, if there is another sequence related area to get burned on please post an example - it's good to know the gotchas up front.

6.

xsl expressions return type

Michael Kay


>     <xsl:template match="open-office:chapter" 
>     mode="doc:chapter-has-footnotes"
>     as="xs:boolean">
>       <xsl:sequence select="exists(.//footnote)"/> </xsl:template>

> 1. Return the boolean value to whom | what?

To its caller. For example,

<xsl:variable name="x" as="xs:boolean">
  <xsl:apply-templates select="chap" 
    mode="doc:has-footnotes"/> 
</xsl:variable>

If the call on apply-templates returns a boolean, that boolean will be the value of variable $x.

We all tend to think in terms of the 1.0 model where XPath expressions read the source document and XSLT instructions write to the result tree. Thanks largely to Jeni Tennison's intervention half way through the 2.0 design process, that's no longer the processing model: instructions and expressions now both return results to their caller, and the result can be any sequence of atomic values or nodes.

There are basically two ways of getting the result of an XSLT instruction back into the XPath world to make it available for further processing: you can assign it to a variable, as above, or you can return it as the result of a function, as in my earlier example:

<xsl:function name="doc:has-footnotes" as="xs:boolean">
  <xsl:param name="chap" as="element()"/>
  <xsl:apply-templates select="$chap"/>
</xsl:function> 

7.

Is XSLT 2.0 Turing complete

Michael Kay


> I have heard that XSLT 1.0 is Turing complete .. Is
> XSLT 2.0 also? If yes, what will be the proof of this
> fact?

There is a proof that XSLT 1.0 is Turing complete at unidex

The proof clearly applies equally to XSLT 2.0 since the Universal Turing Machine used in the proof is a legal XSLT 2.0 stylesheet.

8.

Copy, without namespaces

Michael Kay

In XSLT 2.0 you can do xsl:copy without copying namespaces by adding the attribute copy-namespaces="no" to the xsl:copy or xsl:copy-of element.

9.

Start processing at a named template

Andrew Welch, Mike Kay and Colin Paul Adams


    >> If there's no source file then you can't start processing at
    >> <xsl:template match="/">, because "/" wouldn't match anything -
    >> so you have to start at a named template.
    > ...but if / is a named template then you can?

Yes. FXSL for XSLT 2.0 has all the initial templates declared as:

<xsl:template match="/" name="initial" >

if a source document is not required. That way, Dmitre can specify a dummy source document (as that is the way he prefers to work), and I can specify an initial template (as I like to work from the command line).

    
> which makes me wonder why you couldnt start processing at
> / when no source file and no initial template has been
> given?

> Maybe it's because internally you can't execute a template
> unless it has a name or you have something for it to
> match?  Just thinking out loud now...

Yes. Basically, you have nothing for '/' to match against. When you have no source document, there is no initial context node.

10.

Wrong version

MIchael Kay



>   The only place where the "my" prefix was used is the @name
> of an xsl:function.  So I guess the processor is already an
> XSLT 2.0 processor.  I guess the OP just copy and paste my
> sample without adding the namespace declaration.
>

The stack trace showed that he was using Xalan/XSLTC.

If you specify version="2.0" on a stylesheet and submit it to a 1.0 processor, it runs in forwards-compatibility mode. This means that the <xsl:function> element will be ignored; but if there's an XPath expression that calls my:function without declaring the prefix my, it's quite likely this will give a compile-time error.