General

1. Whats new in XPATH 2.0
2. Comments
3. XSLT 2.0 writing style
4. Format number problem
5. Always call a template after all others
6. Change namespace with copy-of?
7. Decimal precision
8. Get filename of input document
9. Convert Latin characters to plain
10. Identity transform, XSLT 2.0
11. Shortest path between two nodes

1.

Whats new in XPATH 2.0

Jeni Tennison


>  Anyone care to try and pinpoint what is
> going to make XPath2.0 better rather than just different?

There are a few things that I think are really useful.

General Steps:

Now you don't have to use xsl:for-each to change to the relevant document in order to use a key or ID. You can use the key() or id() function within an path. For example:

  document('icons.xml')/key('icons', current()/@name)

The same goes for other expressions that return nodes from nodes. For example, you can get all the headings in an HTML document with:

  //(h1 | h2 | h3 | h4 | h5 | h6)

rather than using something like:

  //*[self::h1 or self::h2 or self::h3 or
      self::h4 or self::h5 or self::h6]

as you have to in XPath 1.0.

Range Expressions:

Rather than using recursive template or the Piez Method to iterate a certain number of times, you can now create sequences of a certain length to iterate over. For example, insert five br elements with:

  <xsl:for-each select="(1 to 5)">
    <br />
  </xsl:for-each>

Node Identity Comparisons:

No more generate-id() = generate-id() or count(.|...) = 1 constructions! Now you can do:

  $node1 == $node2

to test whether two nodes are the same node. Of course the main reason that we had to use these constructions in the first place was the lack of grouping support, but nevertheless it's a good thing.

Precedes and Follows Comparisons:

Now you can tell whether one node precedes another within a document using a simple syntax:

  $node1 << $node2

I can't bring an example to mind right now, but I'm sure there have been times when I've needed to do that (again, during grouping, I think).

If Expressions:

Much as I object to keywords, being able to use if/then/else within a select attribute rather than having an xsl:choose inside a variable, or using a hideous construction involving concat(), substring() and 1 div 0 is a real godsend :) For example:

  if (contains($string, $delimiter))
  then substring-before($string, $delimiter)
  otherwise $string

And of course there are the functions:

- upper-case() and lower-case()
- match() and replace() (for all they're currently underspecified)
- min() and max()

Mike adds

To add to the points made by Jeni, I think that the availability of sequences-of-strings and sequences-of-numbers is going to give substantial benefits when writing the more complex stylesheets: they are much more flexible and efficient than using trees as the only data structuring mechanism for working data

2.

Comments

Jeni Tennison



XPath 2.0 allows comments *within* expressions (using the syntax {-- ... --}), which would mean you could do:

  <xsl:variable name="rows"
    select="//row[count(.|key('rows', @code)[1]) = 1]
            {-- select unique row elements by their code attribute --}" />

You can see why this might be necessary, given that XPath expressions will contain for and if statements...

3.

XSLT 2.0 writing style

Michael Kay



> Which one of these two styles should be preferred?

> What are the advantages and shortcomings of each style 
> regarding readability, compactness, flexibility, efficiency 
> and maintainability?

I think it will take a while for this consensus to emerge.

My own rule of thumb was until recently "use XPath to find nodes and to compute atomic values, use XSLT to create new nodes". But with the introduction of xsl:sequence, I've started avoiding really long (20-line) path expressions, and have taken to breaking them up either by using xsl:for-each and xsl:choose or by calls on stylesheet functions.


> I understand this as personal preference or is this 
> preference based on some objective criteria?

It's based on instinctive judgements about the engineering quality of the code, but it's far too early to judge whether my instincts are right.


> I would appreciate your opinion on how do these two styles -- 
> long (20-line
> +) XPath expressions versus xslt-structured style --  score in 
> +readability,
> compactness, flexibility, efficiency and maintainability.

There has been an ongoing debate about the merits of using XML syntax versus non-XML syntax for a while now, and I don't think it's going to go away. It promises to be one of these perennials like "elements vs attributes".

Some people seem to take an instinctive dislike to having attributes in an XML document whose content is 20 lines long. Part of the rationale is that the newlines don't survive XML parsing, but the newlines are essential to the readability of the code.

I think it's going to be quite unusual to see XQuery parsers that report more than one syntax error in a single compile run. The grammar is not robust enough to allow easy recovery from syntax errors, though the introduction of semicolons as separators in the latest draft helps. Reporting multiple errors in XSLT is easy because of the 3-phase parsing approach (XML parsing first, then XSLT, then XPath). This gives a definite advantage when you're doing something on the DocBook scale.

> In other words, why should we prefer the "XSLT style" to the 
> "XQuery style"?

I think the advantages of an XML-based syntax are:

(a) it's useful where the stylesheet includes large chunks of stuff to copy into the result document
(b) it's useful when you want to transform stylesheets or to do any kind of reflection or introspection
(c) it reuses all the XML machinery such as character encodings, base URIs, entity references
(d) it's much easier to provide user or vendor extensions to the language in a controlled way.

But there's no doubt that the XQuery style makes it much easier to write short queries.

Dimitre responds with

Apart from purely stylistic reasons, I have found one case where the XSLT style simply cannot be transformed completely into the non-xml style. This happens, when we need to define and use variables, whose value is a sequence.

The variables that can be defined in an XPath expression are of the form:

  for $someVar in $XPathExpression return

and it works when we need a variable with atomic value.

However, it is not possible using the above syntax to define a variable, whose value is a sequence.

Thus, if I would need to have my code in a single XPath expression, I would have to write:

<xsl:sequence select=
              "f:scanIter($arg1 - 1, $arg2, $arg3)
                ,
             f:apply($arg2, f:scanIter($arg1 - 1, $arg2, $arg3)[last()])"
          />

because f:scanIter() returns a sequence and it is impossible to write:

         "for $vIterMinus in f:scanIter($arg1 - 1, $arg2, $arg3) return
               $vIterMinus, f:apply($arg2, $vIterMinus[last()])"

But writing the first expression above will cause

    f:scanIter($arg1 - 1, $arg2, $arg3)

to be calculated twice.

Therefore, the more efficient way to express this is using XSLT-style:

         <xsl:variable name="vIterMinus"
                       select="f:scanIter($arg1 - 1, $arg2, $arg3)"/>

         <xsl:sequence select=
          "$vIterMinus, f:apply($arg2, $vIterMinus[last()])"/>

4.

Format number problem

Michael Kay



> I want to use the function  format-number to to put a number 
> in a  money  format.  This works when the number is either 
> not signed or negatively signed. The XML we got from our 
> client has a "+" sign like this example:

> <xsl:value-of select="format-number(+00003345351.89,'$#,###.00')"/>

XPath 1.0 doesn't allow a leading plus sign in a number. You can get rid of it using translate($num, '+', '').

XPath 1.0 predates XML Schema: its authors did a good job, but predicting the contents of XML Schema would have been nothing short of miraculous.

XPath 2.0 fixes this.

5.

Always call a template after all others

Michael Kay

In XSLT 2.0 you can do

<xsl:template match="*" mode="#all" priority="10">
  <xsl:next-match/>
  <xsl:call-template name="logRowId"/>
</xsl:template>

6.

Change namespace with copy-of?

Michael Kay



> is there a possibility to copy an element into another 
> namespace with means of <xsl:copy-of> ? 

No. xsl:copy-of can only be used to create an exact copy. If you want to change anything, you need to process each node individually using a template rule ("or otherwise", as they say in maths exams).

In XSLT 2.0 there is an <xsl:namespace> instruction designed to fill this gap.

In XSLT 1.0 the usual circumvention is to create an RTF containing the required namespace node, and then copy it (which requires xx:node-set):

<xsl:variable name="dummy">
 <xsl:element name="{$prefix}:xxx" namespace="{$uri}"/>
</xsl:variable>

...

<xsl:copy-of 
  select="xx:node-set($dummy)/*/namespace::*[name()=$prefix]"/>

Copying of namespace nodes is defined by an erratum to XSLT 1.0.

7.

Decimal precision

Michael Kay

With XSLT 2.0

(a) a numeric literal such as 1.0 is interpreted as a decimal value

(b) you can force numbers to be treated as decimal values by casting e.g. xs:decimal($x)

(c) the result of an operation on two decimals, e.g. $x div $y, is itself a decimal (though with division, the precision is implementation-defined)

(d) if you have a schema, and the schema defines the type of an element or attribute as decimal, then it's automatically treated as decimal without needing an explicit cast (like in (b)).

So summing a set of money amounts should give you the right answer without any rounding errors.

8.

Get filename of input document

Abel Braaksma


>> Is there a way to get the file name of the document you are
>> processing? If I
>> use Document-uri() it returns the whole file path.
>
> i'm using
>
>   tokenize(document-uri(/), '/')[last()]



>
> or '\' if working with backslashes.

A backslash cannot be a literal part of a URI, it is not allowed. It must be escaped for that purpose as %5C. Because a URI may well have additional information after the 'filename', namely the query part and the fragment part, I recommend using the regular expression that is provided as a convenience in the RFC2396 paper. It puts the path in $5::

^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12            3  4          5       6  7        8 9

You can use the tokenize function above for splitting $5. Not however that that a path separator not necessarily be a slash, it depends on the scheme part ($2) what is and what is not allowed in the path expression.

9.

Convert Latin characters to plain

Michael Kay


> I am looking for some xslt to take a string of latin script
> unicode characters and flatten them to their plain
> equivalents. For example the E7 would become c, F1 would become
> n, F6 would become o and so on.

In XPath 2.0 I think you can achieve this using

codepoints-to-string(string-to-codepoints(normalize-unicode($in,
='NFKD'))[. &lt; 127])

This splits composite characters into the base character plus modifiers, then strips off the modifiers.

10.

Identity transform, XSLT 2.0

Andrew Welch, Mike Kay

<xsl:template match="element()">
  <xsl:copy>
    <xsl:apply-templates select="@*,node()"/>
   </xsl:copy>
</xsl:template>

<xsl:template match="attribute()|text()|comment()|processing-instruction()">
  <xsl:copy/>
</xsl:template>

11.

Shortest path between two nodes

David Carlisle




>In the xml structure below, imagine:
>..- all nodes have name (not necessarily unique) and id (unique)
>attributes
>- you are in the context of node[@id='a3'] and you want to build a path
>from the name attribute to node[@id='b3'] that would result in the
>shortest path, which would be:

>../../../bar/baz/bat

>- and in another case you are in the context of node[@id='c3'] and you
>want the shortest path to node[@id='b3'], which would be:

>../../baz/bat

>- and in another case you are in the context of node[@id='b4'] and you
>want the shortest path to node[@id='b3'], which would be:

>bat

>What is the most efficient to find those paths? (this is to create page
>relative links, rather than root relative, for html pages)

This doesn't get quite the result you ask for I get ../bat for the last one, also I assume that nodes without a name should not contribute to the path (otherwise I'd need an extra ../ to get out of the unnamed node that contains c2).

%lt;xsl:stylesheet version="2.0"
		xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
		xmlns:f="data:,f">
 
%lt;xsl:key name="id" match="*" use="@id"/>
%lt;xsl:variable name="r" select="/"/>
%lt;xsl:template match="/">

a3 to b3 %lt;xsl:value-of select="f:path('a3','b3')"/>
c3 to b3 %lt;xsl:value-of select="f:path('c3','b3')"/>
b4 to b3 %lt;xsl:value-of select="f:path('b4','b3')"/>

%lt;/xsl:template>


%lt;xsl:function name="f:path">
 %lt;xsl:param name="a"/>
 %lt;xsl:param name="b"/>
 %lt;xsl:variable name="an" select="key('id',$a,$r)"/>
 %lt;xsl:variable name="bn" select="key('id',$b,$r)"/>
 %lt;xsl:value-of>
  %lt;xsl:sequence select="($an/ancestor-or-self::node[@name] except $bn/ancestor::node)/'../'"/>
  %lt;xsl:value-of separator="/" select="($bn/ancestor-or-self::node[@name] except $an/ancestor::node)/@name"/>
 %lt;/xsl:value-of>
%lt;/xsl:function>

%lt;/xsl:stylesheet>