Sequences

1. Understanding xsl:sequence
2. Sequences vs node-sets
3. Date translation in xpath 2
4. Understanding the Relationship of Nodes, Sequences, and Trees
5. Attribute to element, XSLT 2.0
6. On sequences and document nodes.
7. Node identity
8. Comparisons; String and Empty
9. Sequence concatenator
10. Sequences
11. atomization
12. Where does the separator come from?
13. Sequence, node-sets and result tree fragments
14. Is there a separator choice available for AVTs?
15. Length of a sequence
16. Length of a sequence
17. Value-of, copy-of and sequence
18. as attribute
19. When is the context item not redundant in a path expression?
20. Merging node-sets & sequences
21. String Serialisation
22. Unique values in attributes
23. Test against preceding values
24. Sequences
25. Sequences
26. When = is not equal

1.

Understanding xsl:sequence

David Carlise.



> Apologies for these seemingly random questions but I have read numerous
> resources and am still struggling to understand why xsl:sequence is so
> important.

It's important, especially in variable and function definitions, as it allows the return of _existing_ nodes rather than copies of nodes.

> <xsl:sequence select="." /> is constructing new nodes in the output that is,
> in effect, a copy-of the context node.

No, it selects the current node. (unlike <xsl:copy-of which would construct a copy). If the xsl:sequence is being used to generate an output result tree, then the difference is slight, as there is an implied copying anyway in that construction, but if the current template is being used to construct the value of a variable.there is a big difference. In one case the variable will have a reference to teh current node, and in teh other it will have a reference to a copy.

> 2. I also understand sequence allows you to construct a sequence of
> different datatypes in one expression,
>
> What, then, is the point of the concat function in XSLT 2? ie could

concat serves a completely different purpose, it concatenates strings, producing a single string.

> What, then, is the point of the concat function in XSLT 2? ie could
> <xsl:sequence select="concat(meta/brand/text(), ' | ', genre/text())"/>
> be rewritten:
> <xsl:sequence select="meta/brand/text(), ' | ', genre/text()"/>

The first one makes a sequence of one item, a string.

The second one makes a sequence of three items, a text node, a string and a text node

Compare

<xsl:value-of separator=",">
 <xsl:sequence select="concat(meta/brand/text(), ' | ',
 genre/text())"/>
</xsl:value-of

<xsl:value-of separator=",">
 <xsl:sequence select="meta/brand/text(), ' | ',
 genre/text()"/>
</xsl:value-of

> Does <xsl:sequence select="@price"/> not assign an attribute node to the
> variable $prices? If so surely this is wasteful?

it references an _existing_ attribute node, but this is coerced to an atomic xsl:double value because of the as attribute which specifies the variable holds a sequence of doubles. I'm not clear where you see the waste.


> If a variable value is a reference rather than a copy what advantages
> does that provide?

it's a lot faster and a lot less memory.

If you go

<xsl:variable name="a" as="document-node()">
<xsl:copy-of select="/"/>
</xsl:variable>

<xsl:variable name="b" as="document-node()">
<xsl:sequence select="/"/>
</xsl:variable>

then $b is simply a pointer to the current document, which takes no time to build and hardly any space to store, and $b is / is true.

on the other hand $a is a document structurally identical to the current document but it is a copy, so it takes time and memory proportional to the size of the document to build, and $a is / is false.


> I was thinking that an attribute node contains a text node. Reading up on it
> an attribute node seems to *be* both the attribute name and its value. When
> it is co-erced to an atomic xs:double value it somehow loses the attribute
> name part and just takes on the value - this is what I was referring to as
> waste.

It may be take less space to store the sequence of attribute nodes depends on the implementation, but logically it is just a sequence of pointers into the existing tree, so the data doesn't need to be copied, conversely if you store the data then you are copying the data and need to store the sequence of doubles which may or may not be more data than a sequence of attribute node pointers. Also of course often you _need_ to store a list of attribute nodes, for example if you need to know whhat there parents are $variable/.. wil work on a sequence of attribute nodes, to give a sequence of elements.

But it's wrong to think of xsl:sequence as an optimisation of xsl;copy-of normally the important thing is that one works and the other doesn't. (You can usually replace copy-of by sequence, but not always, you usually can't replace xsl:sequence by copy-of)

2.

Sequences vs node-sets

Evan Lenz

Another significant difference to note is that sequences, unlike node-sets, can contain the same node more than once.

The following expression returns one bar element and one foo element in a document-order sequence (as with XPath 1.0, duplicates are removed):

(bar[1] | foo[1] | bar[1])

However, the following expression (using the new comma operator) returns a sequence of the bar element followed by the foo element followed by the same bar element again, in that order (regardless of their relative document order):

(bar[1], foo[1], bar[1])

Hopefully, these examples clearly illustrate this characteristic of sequences (if not its usefulness (and it is useful)).

3.

Date translation in xpath 2

Michael Kay

Note: Question was how to translate a date 20021231 to get the month as a word, e.g. December


> Things get even better in XSLT2/Xpath2
> <xsl:stylesheet ...  xmlns:dt="http://www.mySchemas.com/date">
> <xsl:function name=" dt:getMonthName" 
> xmlns:dt="http://www.mySchemas.com/date">
>     <xsl:param name="date"/>
>     <xsl:variable name="months" 
> select="tokenize("January,February,March,April,May,June,July,A
> ugust,Septembe
> r,October,November,December",',')"/>
>     <xsl:return select="$months[number(substring($monthNum,5,2))]"/>
> </xsl:function>
> ...

No, it's even easier than that in XPath 2

<xsl:variable name="months" select="('January', 'February', ...)"/>
<xsl:result select="$months[number(substring($monthNum,5,2))]"/>

i.e., you can write the sequence as a literal sequence, you don't need to create it by tokenizing a string.

4.

Understanding the Relationship of Nodes, Sequences, and Trees

Roger Costello

I have been having some excellent exchanges with Michael Kay and have learned a lot. I thought that I would summarize what I learned, so that others can benefit as well.

Understanding the Relationship of Nodes, Sequences, and Trees

(1) A node can belong to only one tree.
(2) A node may belong to any number of sequences.
(3) Axes always apply to the tree that the node is in. Axes never apply to the sequence that the node is in.
(4) When xsl:sequence is used in a sequence which has no parent node then the sequence contains the original node referenced by xsl:sequence and not a copy.
(5) When xsl:sequence is used in a sequence which has a parent node then the element that is referenced by xsl:sequence is copied. Thus, the sequence is comprised of a copy and not the original.
(6) The preceding-sibling and following-sibling axes can only be used in a tree. That is, they cannot be used in a sequence that does not have a parent node. (Nodes are "siblings" iff they have a common ancestor)

To understand these rules, let's consider an example.

Below is the XML document that my stylesheet operates upon:

<?xml version="1.0"?>
<FitnessCenter>
    <Member>
        <Name>Jeff</Name>
    </Member>
    <Member>
        <Name>David</Name>
    </Member>
    <Member>
        <Name>Roger</Name>
    </Member>
</FitnessCenter>

In my stylesheet I have created this variable:

<xsl:variable name="members" as="element()+">
    <xsl:sequence select="/FitnessCenter/Member[2]"/>
    <Member>
        <Name>Sally</Name>
    </Member>    
    <Member>
        <Name>Linda</Name>
    </Member>    
</xsl:variable>

Note that this variable contains a mix of elements - the first element (the David Member) comes from the FitnessCenter. The second and third elements (Stacey and Linda) are defined within the variable itself.

Further, note that this sequence does not have a parent node (due to the presence of as="element()+".

A characteristic of xsl:sequence when used in a sequence that does not have a parent node is that it does not create a copy of the node that it references; instead, it uses the original node. Thus, $members[1] is referencing the original node:

    <Member>
        <Name>David</Name>
    </Member>

from the FitnessCenter.

Now let's consider the above rules in the context of this example.

(1) A node can belong to only one tree.

$members[1] references this node:

    <Member>
        <Name>David</Name>
    </Member>

This node belongs to the FitnessCenter tree.

$members[2] references this node:

    <Member>
        <Name>Stacey</Name>
    </Member>

This node belongs to the $members sequence. ($members does not create a tree. It is only creating a sequence of nodes.)

(2) A node may belong to any number of sequences.

This node:

    <Member>
        <Name>David</Name>
    </Member>

belongs to both the FitnessCenter sequence as well as the $members sequence.

(3) Axes always apply to the tree that the node is in. Axes never apply to the sequence that the node is in.

Consider this XSLT statement which uses the preceding-sibling axis:

<xsl:copy-of select="$members[1]/preceding-sibling::*[1]"/>

$members[1] references the David node, which is in the FitnessCenter tree. Therefore, it is referencing David's preceding-sibling in the FitnessCenter tree:

    <Member>
        <Name>Jeff</Name>
    </Member>

Likewise, this is referencing David's following-sibling in the FitnessCenter tree:

<xsl:copy-of select="$members[1]/following-sibling::*[1]"/>

Output:

    <Member>
        <Name>Roger</Name>
    </Member>

Note that you cannot use preceding-sibling nor following-sibling on $member[2] or $member[3] because these axes only apply to nodes in a tree. The Stacey Member and Linda Member are not in a tree - they are only in a sequence.

(4) When xsl:sequence is used in a sequence which has no parent node then the sequence contains the original node referenced by xsl:sequence and not a copy.

Consider this variable declaration:

<xsl:variable name="members" as="element()+">
    <xsl:sequence select="/FitnessCenter/Member[2]"/>
    <Member>
        <Name>Sally</Name>
    </Member>    
    <Member>
        <Name>Linda</Name>
    </Member>    
</xsl:variable>

This variable is comprised of a sequence of nodes. The sequence does not have a parent node. Therefore, the sequence is comprised of the original node.

(5) When xsl:sequence is used in a sequence which has a parent node then the element that is referenced by xsl:sequence is copied. Thus, the sequence is comprised of a copy and not the original.

Consider this variable declaration:

<xsl:variable name="members">
    <xsl:sequence select="/FitnessCenter/Member[2]"/>
    <Member>
        <Name>Sally</Name>
    </Member>    
    <Member>
        <Name>Linda</Name>
    </Member>    
</xsl:variable>

Note the absence of as="element()+". Thus, this Member sequence has a document node as its parent. Consequently, a *copy* of /FitnessCenter/Member[2] is made and used in the sequence. So this statement produces an empty output:

<xsl:copy-of select="$members[1]/preceding-sibling::*[1]"/>

(6) The preceding-sibling and following-sibling axes can only be used in a tree. That is, they cannot be used in a sequence that does not have a parent node. (Nodes are "siblings" iff they have a common parent)

Therefore, for example, you cannot use following-sibling to get the member that follows Sally:

<xsl:variable name="members" as="element()+">
    <xsl:sequence select="/FitnessCenter/Member[2]"/>
    <Member>
        <Name>Sally</Name>
    </Member>    
    <Member>
        <Name>Linda</Name>
    </Member>    
</xsl:variable>

This statement produces an empty output:

<xsl:copy-of select="$members[2]/following-sibling::*[1]"/>

However, in this version the sequence does have a parent (document) node, so you can use following-sibling to retrieve the member that follows Sally:

<xsl:variable name="members">
    <xsl:sequence select="/FitnessCenter/Member[2]"/>
    <Member>
        <Name>Sally</Name>
    </Member>    
    <Member>
        <Name>Linda</Name>
    </Member>    
</xsl:variable>

This statement:

<xsl:copy-of select="$members[2]/following-sibling::*[1]"/>

yields this output:

    <Member>
        <Name>Linda</Name>
    </Member> 

An alternate form that will also work is this:

<xsl:variable name="members" as="element()+">
    <Members>
          <xsl:sequence select="/FitnessCenter/Member[2]"/>
          <Member>
              <Name>Sally</Name>
          </Member>    
          <Member>
              <Name>Linda</Name>
          </Member>    
    </Members>
</xsl:variable>

Note that the member sequence has a parent node (<Members>). Therefore, the following-sibling and preceding-sibling axes can be used on the member sequence, e.g.,

This statement:

<xsl:copy-of select="$members/Member[2]/following-sibling::*[1]"/>

yields this output:

    <Member>
        <Name>Linda</Name>
    </Member> 

5.

Attribute to element, XSLT 2.0

Michael Kay


> I have an XML tag which looks like this: 
> <text bold=true  italic=true 
> underline=false>Help!</text> 
> In this case, the result in  HTML should be: 
> <font>
>    <b>
>       <i>
>          Help!
>       </i>
>    </b>
> </font>
>
> I simply need to transform the attributes into HTML 
> elements so that any of the possible combinations of 
> the atributes' values would be transformed correctly.

The following solution (using a few 2.0 features) is perhaps a little more elegant.

<xsl:template match="text">
<xsl:apply-templates select="." mode="inline"/>
</xsl:template>

<xsl:template match="*[@bold='true']" mode="inline" priority="10">
<b><xsl:sequence select="my:inline(., @bold)"/></b>
</xsl:template>

<xsl:template match="*[@italic='true']" mode="inline" priority="9">
<i><xsl:sequence select="my:inline(., @italic)"/></i>
</xsl:template>

<xsl:template match="*[@underline='true']" mode="inline" priority="8">
<u><xsl:sequence select="my:inline(., @underline)"/></u>
</xsl:template>

<xsl:template match="*" mode="inline" priority="7">
<xsl:apply-templates/>
</xsl:template>

<xsl:function name="my:inline" as="element()">
  <xsl:param name="elem" as="element()"/>
  <xsl:param name="att" as="attribute()"/>
  <xsl:variable name="x">
    <xsl:copy>
    <xsl:copy-of select="$elem/(node() | @*) except $att"/>
    </xsl:copy>
  </xsl:variable>
  <xsl:apply-templates select="$x" mode="inline"/>
</xsl:function>

6.

On sequences and document nodes.

Michael Kay

XSLT 2.0 allow you to create free-standing nodes. For example, there was a question from someone who wanted to write a function that returned a set of attribute nodes:

<xsl:function name="my:atts" as="attribute()">
  <xsl:param name="p"/>
  <xsl:attribute name="a" select="{$p+1}"/>
  <xsl:attribute name="b" select="{$p+2}"/> </xsl:function>

(Essentially the same as an attribute-set, except that attribute sets do not allow parameters.)

You can copy these attributes to an element, like this:

<e>
 <xsl:copy-of select="my:atts(17)"/>
</e>

But you can also apply-templates to them:

<xsl:apply-templates select="my:atts(29)"/>

If you try to find the parent of one of these attributes, you find it hasn't got one:

<xsl:value-of select="count(my:atts(6)/..)"/> displays "0".

And of course the same applies to elements.

You can generate a temporary tree with a document node as root, that happens automatically most of the time (as in xslt 1) as if you xsl:copy-of (for example) an element into a variable then a document node will be formed automatically to contain it. If you use the as="..." and specify some type other than document node then this automatc creation of document nodes is supressed.




> Implying that a function can't return any node set with a document 
> node?
>   

You can do it according to the Nov draft by

<xsl:function ..
  <xsl:variable name="tree">
    <a>tree</a>
  </xsl:variable>
  <xsl:sequence select="$tree"/>
</xsl:function>

But that's a bit cumbersome, so we've added an xsl:document element to create document nodes explicitly:

<xsl:function ..
  <xsl:document>
    <a>tree</a>
  </xsl:document>
</xsl:function>

This will appear in the next draft.

7.

Node identity

Michael Kay



> Maybe there is already a resource out there, but I can't seem to find 
> a simple explanation.  Maybe because its not so simple?

It's not that simple, certainly not with 2.0 when schemas start to come into the picture.

I've covered it in my XPath 2.0 and XSLT 2.0 books, and it's certainly not easy to condense into a quick message. But let's try a summary:

The value of an XPath expression is always a sequence; a sequence contains zero or more items; an item may be an atomic value or a node.

The nodes fall into the same 7 kinds as XPath 1.0: elements, attributes, text nodes, etc.

Atomic values belong to one of the primitive types defined in XML Schema: xs:integer, xs:string, xs:boolean, xs:date, and so on. Alternatively they may belong to a derived atomic type, which permits a subset of the values of a primitive type (e.g. all strings of length 6). This can be a user-defined type defined in a schema, or a built-in type.

If your source document has been through schema validation, then the elements and attributes will be annotated with a schema type. This may be a simple type or (in the case of elements) a complex type. For example, an attribute annotated as xs:date contains a date. Complex types allow child elements, simple types do not. Simple types may be atomic types (as above) or they may be list types (a sequence of integers, say) or union types (a decimal or a date). When in XPath you use a node in a context where atomic values are expected, e.g. comparison or arithmetic, the typed value of the node is extracted automatically (a process called atomization). This means that if your schema declares attributes to be numbers, they will be processed as numbers.

You can declare the types of all the variables and parameters, function results etc in your stylesheet. You don't have to: the default is item()* which allows any value (any sequence of any items). For example, if you declare a parameter as xs:integer? then the value must be either an integer or nothing (an empty sequence). Where the variable holds nodes, you can declare both the kind of node and the required type annotation: for example a parameter that's an element holding a purchase order might be declared as="element(purchase-order)".

You can write XSLT templates based on types rather than names, for example match="attribute(*, xs:date)" matches all attributes of type date. So you can write one rule for formatting all such attributes, regardless of their name.

When your stylesheet constructs new elements, you can ask for them to be validated against a schema. This both checks them for correctness, and annotates them with types that are used in any subsequent processing. You do this using the validation and/or type attributes on instructions such as xsl:element and xsl:result-document. This means for example that if your stylesheet fails to output a value for a mandatory attribute, you'll get an error message saying so, and telling you exactly where the error in the stylesheet is. In some cases you'll even get this error at stylesheet compile time.

You can ignore most of this and do dynamic typing as in XSLT 1.0 if you prefer. But there are considerable software engineering advantages in declaring your types: it means you get better error messages when you make mistakes. In general, if you make coding errors in XSLT 1.0, your stylesheet produces wrong output. The same mistake in 2.0 will often produce type errors, reported often at compile time but at any rate at run-time.

OK? If not, there's more in the book...

8.

Comparisons; String and Empty

As with XPath 1.0, when you compare a set of things to a single thing, the result is true if there is at least one match. This is true whichever operator you use. This means that if the set of things is empty, the result of the comparison is false, whether the operator is "=", "!=", or anything else.

So if $f is an empty sequence and $s is a string,

  $f = $s is always false
  $f != $s is always false

You can test whether a sequence is empty using the function empty($f).

The result of string($f) when $f is an empty sequence is the zero-length string, "".

9.

Sequence concatenator

Michael Kay


 > If you're really determined you could do
 > 
 > <xsl:variable name="dummy" as="element()">  <dummy/> </xsl:variable>
 > 
 > ...
 > 
 >  <xsl:apply-templates select="(ABC/D, $dummy)[1]"/>
 > 
 > <xsl:template match="dummy">
 >  <something/>
 > </xsl:template>

What does the comma seperation achieve in the select?

"," in XPath 2.0 is a sequence concatenation operator. The construct (A, B)[1] selects the first item in the sequence formed by concatenating the sequences A and B: that is, the first A if there is one, or the first B otherwise.

10.

Sequences

David Carlisle....


> Now everything is a sequence.
> Earlier everything was a node-set.

not quite true. In XSLT2 everything is a sequence but in xslt1 atomic values such as numbers and strings were not in a node set or a sequence.

One advantage of sequences is that you can store sequences of strings etc ("a","b", ....)

The other is that sequences become first class objects that may be saved in variables and re-used. XSLT1 had sequences (called lists there) but these were just transient objects "the current node list" that could not be stored.

If you go select="ancestor::*[2]" then just inside the step the current node list is the ancestors in reverse document order so [2] selects your grandparent, but this could not be stored, ,xsl:variable name="x" select="ancestor::*"/> selects the unordered set of ancestors then select="$x[2]" orders that set in document order so [2] selects teh grandchild of the document element.

Similarly inside

<xsl:for-each select="zzzz">
 <xsl:sort select="mmm"/>

the current node list is the ordered list sorted by mmm but that list can not be saved in order, if you need it again later you have to re-sort. In XSLT2 sorted sequences can be saved.

Other reasons have to do with alignment with xsd schema but they are, natutrally:-) almost all bad, but on balance ordered sequences are probably a good thing (although perhaps they might have been better still if they had allowed sequences of sequences)

It wasn't really the "first node" semantics I was refering to so much as the difference between

<xsl:variable name="x" select="('a','b','c')"/> A sequence of strings

and

<xsl:variable name="y" as="item()">
 <xsl:value-of select="a"/>
 <xsl:value-of select="b"/>
 <xsl:value-of select="c"/>
</xsl:variable>

a sequence of text nodes.

As examples of getting ones fingers burned making false assumptions, MK and DC kindly provided the following examples.

One of the common ones was a function returning a sequence of text nodes:

<xsl:function name="f">
  <xsl:text>[</xsl:text>
  <xsl:value-of select="17"/>
  <xsl:text>]</xsl:text>
</xsl:function>

We now have it set up so that if you use this sequence to construct the content of an element or attribute, for example using xsl:value-of select="f()", then no spaces are inserted between the adjacent text nodes. However, you can still scupper yourself if you think of the result as a single string: for example string-length(f()) will give you a type error in 2.0 mode, and the result 1 (the length of the first text node in the sequence) in backwards compatibility mode.

If you do

<xsl:function name="f">
  <xsl:for-each select="1 to 10">
    <xsl:text>[</xsl:text>
    <xsl:value-of select="."/>
    <xsl:text>]</xsl:text>
  </xsl:for-each>
</xsl:function> 

then f()[5] will return a text node containing the string "2"...

I asked

  <xsl:variable name="sortedSequence">
      <xsl:for-each select="zzzz">
       <xsl:sort select="mmm"/>
  </xsl:variable>

  Is that the sort of thing?
  the variable now holds the sorted sequence?

No, if you do that the variable will hold a document node which contains the zzz elements as children.

You need an as="element()*" on the variable to stop it making teh documet node (which it does mainly for XSLT 1 compatibility)

There are other ways of ending up with an ordered sequence, teh most dirst being

<xsl:variable name="x" select="(a,b,c)"/>

which stores any elements selected by a followed by elements selected by b followed by elements selected by c, in that order, whatever order they are in in the source.

11.

atomization

Michael Kay



> So, just to be clear about this:

> <xsl:variable name="foo" as="xs:string?">Hello</xsl:variable>
> <xsl:variable name="bar">World</xsl:variable>

> <xsl:value-of select="concat($foo,' ',$bar)"/>

> Here at the point of the concat() $foo is -already- a string and $bar 
> is a nodeset?  That is, $foo is never a nodeset and $bar is nodeset 
> that gets implicitly cast to a string?

Yes.

Technically the conversion isn't a cast, it's atomization, but that's a quibble. Also, XSLT 2.0 doesn't talk about nodesets: the type of $bar is document-node().

> 
> If so, this would mean it's possible to make a choice between long 
> if-then-elses in the select attribute and choose/whens in the body 
> based on readability/maintainability etc and not on performance.

Yes, absolutely. Saxon will probably generate the same internal code for both.

12.

Where does the separator come from?

Michael Kay

The relevant construct is:

<xsl:variable name="result">
      <xsl:sequence select="upper-case(substring($s, 1, 1))"/>
      <xsl:sequence select="substring($s, 2)"/> </xsl:variable> 

The value of the variable is a document node, which has a single text node as its child. When you supply a sequence of strings in this situation, they are space-separated. The reason for the rule is that this is what you would want to happen if you used the equivalent construct:

<xsl:variable name="result">
      <xsl:sequence select="10 to 12"/>
</xsl:variable> 

where you would want the result to be "10 11 12" rather than "101112".

If you want the values concatenated you can

(a) use the concat() function

(b) use xsl:value-of instead of xsl:sequence. The difference is that xsl:value-of creates a text node rather than a string, and text nodes are always concatenated with adjacent text nodes.

Because your function is returning a string, I would suggest you avoid creating document nodes and text nodes, which is unnecessarily expensive. Write the function as:

   <xsl:function name="f:first-upper" as="xs:string">
     <xsl:param name="s" as="xs:string?"/>
     <xsl:sequence select="concat(upper-case(substring($s, 1, 1)), substring($s, 2))"/>
   </xsl:function>

it's quite tricky to get the hang of these differences. The WG has in fact tidied the spec up to remove some of the anomalies, but you still need a clear head to understand what's going on. Broadly speaking, the rules are:

Adjacent text nodes that become siblings of each other are concatenated without whitespace.

When a sequence of atomic values is converted to a node (typically a text or attribute node), they are concatenated with space separators.

In some common constructs you get control over the choice of separator, notably with xsl:attribute and xsl:value-of; in other cases (attribute value templates for example) you don't. In cases where you XSLT doesn't give you explicit control over the separator you can use the string-join() function to control it.

If you clearly understand the difference between an atomic value and a text node then I think the rules should become clear.

13.

Sequence, node-sets and result tree fragments

Michael Kay




> So, in XSLT 2.0 do you get a usable node-set or a result tree 
> fragment?

Neither. XPath2 loses both the node set datatype and the result tree fragment datatype.

Both are replaced by sequences (ie ordered finite lists) rather than the unordered node set type of Xpath 1, also Xpath2 sequences can contain values as well as nodes, so you can have a sequence (1,2,3) of three integers for example (with exactly that syntax).

However the important thing here is that in 2.0 if you go

<xsl:varable name="x">
  <xsl:apply-templates/>
</xsl:variable>

then $x holds a sequence of one document node which is something to which you can apply a second set of templates. Thus the need to use a node-set() funtion is removed.

14.

Is there a separator choice available for AVTs?

Michael Kay



> Consider the following XSLT:

> <xsl:variable name="test" as="xs:string*">
>       <xsl:value-of select="'Foo'"/>
>       <xsl:value-of select="'Bar'"/>
>       <xsl:value-of select="'Baz'"/>
> </xsl:variable>

> <xsl:template match="/">
>       <div a="{$test}">
>               <xsl:value-of select="$test" separator=""/>
>       </div>
> </xsl:template>

> Produces:

> <div a="Foo Bar Baz">FooBarBaz</div>

> Is there any where to 'turn off' the whitespace separation of the text 
> nodes in $test, or some technique to use within the AVT to mimic 
> separator="" - I really don't want to use xsl:attribute :)


use

  <div a="{string-join($test, '')}">

15.

Length of a sequence

David Carlisle

reading the spec reveals the answer.

 <xsl:variable name="foo" as="item()*">
   <xsl:text/>abc<xsl:sequence select="'def'"/>
 </xsl:variable>

$foo is a sequence of length three. An empty text node a text node with string value "abc" and a string "def"

 <xsl:variable name="foo2" as="item()*">
   <xsl:text/>abc<xsl:value-of select="'def'"/>
 </xsl:variable>

$foo2 is a sequence of length three. An empty text node a text node with string value "abc" and a text node with string value "def"

So what happens when you do

<xsl:value-of select="$foo" separator=","/>
<xsl:value-of select="$foo2" separator=","/>

Well the 6 stages in the above referenced section get applied.

stage 1 is dicarding zero length text nodes so now $foo is a text node with string value "abc" and a string "def"

and $foo2 is a text node with string value "abc" and a text node with string value "def"

stage 2 is merging adjacent text nodes so now $foo is a text node with string value "abc" and a string "def"

and $foo2 is a text node with string value "abcdef"

stage 3 converts from nodes to atomic values so now $foo is a string "abc" and a string "def"

and $foo2 is a string "abcdef"

stage 5 is concatenating all the sequence together, inserting the separator if one is supplied or a space if not, so now

$foo is a string"abc,def"

and $foo2 is a string"abcdef"

Voila....

16.

Length of a sequence

David Carlisle

reading the spec reveals the answer.

 <xsl:variable name="foo" as="item()*">
   <xsl:text/>abc<xsl:sequence select="'def'"/>
 </xsl:variable>

$foo is a sequence of length three. An empty text node a text node with string value "abc" and a string "def"

 <xsl:variable name="foo2" as="item()*">
   <xsl:text/>abc<xsl:value-of select="'def'"/>
 </xsl:variable>

$foo2 is a sequence of length three. An empty text node a text node with string value "abc" and a text node with string value "def"

So what happens when you do

<xsl:value-of select="$foo" separator=","/>
<xsl:value-of select="$foo2" separator=","/>

Well the 6 stages in the above referenced section get applied.

stage 1 is dicarding zero length text nodes so now $foo is a text node with string value "abc" and a string "def" and $foo2 is a text node with string value "abc" and a text node with string value "def"

stage 2 is merging adjacent text nodes so now $foo is a text node with string value "abc" and a string "def"

and $foo2 is a text node with string value "abcdef"

stage 3 converts from nodes to atomic values so now $foo is a string "abc" and a string "def"

and $foo2 is a string "abcdef"

stage 5 is concatenating all the sequence together, inserting the separator if one is supplied or a space if not, so now

$foo is a string"abc,def"

and $foo2 is a string"abcdef"

17.

Value-of, copy-of and sequence

David Carlisle & Andrew Welch


> But how else can you retreive the value of an element when it is text?

use value-of.

In XSLT 1.0

given <foo>a <!-- b -->c</foo>

then

<xsl:value-of select="foo"/> is a text node "a c" and
<xsl:value-of select="foo/text()"/> is a text node "a "

which is usually not what you want to happen just because someone added a comment.

In XSLT2 value-of does produce the value of the whole sequence and re-merges text nodes, so actually inthi scase in xslt2

<xsl:value-of select="foo/text()"/> is text node "a c"

but it's simpler for you and the system just to go

<xsl:value-of select="foo"/>

Of course, if there are child elements of foo and you only want top level text then you may have to use text() to obtain that but that's a ratherr are requirement. Normally if their is mixed content you need _all_ the content, the contentthatjust happens to be at the top level isn't so often interesting.

xsl:value-of always generates a _text node_ so use it when you want to generate text.

<p><i>This</i> is <b>bold</b> text</p>

Then (in XSLT2)

<xsl:value-of select="p" produces a text node "This is bold text"
<xsl:value-of select="p/text()" produces a text node " is  text"
<xsl:sequence select="p"/> returns the _same_ p node
    <p><i>This</i> is <b>bold</b> text</p>
<xsl:copy-of select="p"/> returns a _new_ p node
    <p><i>This</i> is <b>bold</b> text</p>

xsl:copy-of is there with the semantics it has because of xslt1, although to be honest I can't think off hand of any case where you would need to use xsl:copy-of rather than xsl:sequence.

> My point is that if xsl:sequence can mimic xsl:value-of behavior in this way
> whats the point of using xsl:value-of ever?

xsl:value-of returns the string value of an element, and that isn't obtainable from xsl:sequence (without doing a lot of work that the system has already done) The string-value of an element is designed to produce the "right" text in cases where XML is being used as originally designed as a markup language marking up a text flow.

See the example I showed earlier

<p><i>This</i> is <b>bold</b> text</p>
the string value of that paragraph is "This is bold text" and that is
what <xsl:value-of select="p"/> returns.
<xsl:sequence select="p/text()"/> 

returns the sequence of child text nodes, that is the sequence of two text nodes " is " and " text" this collection of words that happened not to be marked up isn't usually very interesting.

The functions of xsl:sequence and xsl:value-of are almost completely different and usually it's clear which you should use. xsl:copy-of and xsl:sequence of are much closer, as the difference is that copy-of makes a copy, but in most situations where you need to copy, for example copying nodes from the input tree to the result, there is an implied copy operation anyway so the distinction between xsl:copy-of and xsl:sequence is hidden.

Again Andrew provides a good example

The difference between value-of and copy-of is pretty straightforward, the string value of the element vs a deep-copy of the element. The difference between copy-of and xsl:sequence is more subtle and can be highlighted with this example:


<foo>a<bar>b</bar>c</foo>
<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:variable name="sequ" as="element()">
   <xsl:sequence select="/foo/bar"/>
</xsl:variable>

<xsl:variable name="copy" as="element()">
   <xsl:copy-of select="/foo/bar"/>
</xsl:variable>

<xsl:template match="/">
   <result>
       <sequence><xsl:copy-of select="$sequ/parent::*"/></sequence>
       <copy><xsl:copy-of select="$copy/parent::*"/></copy>
   </result>
</xsl:template>

</xsl:stylesheet>

Output:

<result>
   <sequence>
       <foo>a
           <bar>b</bar>c
       </foo>
   </sequence>
   <copy/>
</result>

Here you can see that the sequence has copied the parent node through, while the copy hasn't.

This is because the variable holding the sequence contains a pointer to the element in the orinal source, and therefore knows its parent. The variable holding the copy of the element hold just a copy, there is no parent so nothing is copied to the output.

I hope this helps explain it a little better - sequence refers to elements in there orginal source location, copy-of creates a deep-equal copy.

18.

as attribute

David Carlisle





>Could you explain what
> "as='element(mod)*'" means?

It's a "Sequence Type" sequence types have two forms. value types like xs:integer and node types like this which in this case (with a *) means a sequence of 0-or-more elements of name mod. If you don't specify a type here then xsl:variable would have made a sequence of one document node (/) and the mod elements would have been children of that. (That would also have been usable in this case, although slightly less efficient and verbose and you would generate the extra / node layer)

19.

When is the context item not redundant in a path expression?

Michael Kay, Florent George





> Some examples:
>
> ./a/b/c
> a/b/c/.
> a/b/./c
> a/b/././c
>

It's redundant in all those cases. It's not redundant in:

./(a,b)     (:because it causes sorting into document order:)
.[@a]/@b
a/.[1]/b    (:useless; but not the same as a[1]/b :)

It's hard to come up with an XPath 1.0 example, though. Other than "." on its own, which is technically a path expression (but then so is "3").


> > ./(a,b)     (:because it causes sorting into document order:)
>
>   I don't understand this one.  Could you please explain it a
> little bit further?  It'd be perfect if you have a reference
> within the CRs.
>

XPath section 3.2 describes the effect of the "/" operator as follows:

Each operation E1/E2 is evaluated as follows: Expression E1 is evaluated, and if the result is not a (possibly empty) sequence of nodes, a type error is raised [err:XPTY0019]. Each node resulting from the evaluation of E1 then serves in turn to provide an inner focus for an evaluation of E2, as described in 2.1.2 Dynamic Context. The sequences resulting from all the evaluations of E2 are combined as follows:

1. If every evaluation of E2 returns a (possibly empty) sequence of nodes, these sequences are combined, and duplicate nodes are eliminated based on node identity. The resulting node sequence is returned in document order.

So whereas the sequence (a,b) is not necessarily in document order, the sequence ./(a,b) is, by virtue of the "/" operator. Equally you can force sorting of a sequence S into document order by writing S/., or S|(). Or in this particular case, of course, (a|b)

Florent Georges adds

Ok, thanks. If I understood well, in the following example:

   <xsl:variable name="a" select="elem/a"/>
   <xsl:variable name="b" select="elem/b"/>

   <xsl:sequence select="elem/b, elem/a"/>
   <xsl:sequence select="$b, $a"/>
   <xsl:sequence select="elem/(b, a)"/>

when applied in the context of:

   <elem>
     <a/>
     <b/>
   </elem>

the sequences will be resp. (with an ad-hoc syntax):

   (<b/>, <a/>)
   (<b/>, <a/>)
   (<a/>, <b/>)

In the two first cases, duplicates can also appear, not in the last one.

20.

Merging node-sets & sequences

Wendell Piez




>I find it easier to think of things visually (I always did well in
>geometry but stumbled in algebra). What we're really working with is
>the notion of sets, so I think of a Venn diagram. I picture this:
>
>"x | y[1]"
>
>as "The set of all x-things and the first y-thing."

Right. In XPath, "|" is the union operator: it takes two node-set operands and returns a node-set unifying them.

Note that since these are the nodes themselves, the union ($n | $n) is the same as $n.

These can lead to some funny stuff in XPath 2.0, where we no longer have node sets but instead, sequences of items. So in XPath 2.0 we can say ($s, $s) and have a sequence of all the items in $s (which may include nodes, or rather, references to nodes) followed by all the items in $s again. (There is no ',' operator in XPath 1.0.)

The union operator in XPath 2.0, also "|", is designed so that (a) it tosses duplicate references to the same nodes, and (b), it sorts the nodes into document order in the resulting sequence, so it effectively works the same as "|" in XPath 1.0 although it's technically a different thing.


>and this:
>
>"(x | y)[1]"
>
>as "The first element of the set of all x-things and all y-things."

Right. But note that since sets don't formally have order, an order must be imposed for this to make sense. In XPath 1.0, document order is referred to except in certain special cases (the famous reverse axes).

In XPath 2.0, since sequences do have order, the expression ($s, $s)[1] will get you the first item of the sequence $s, just once, or nothing at all if $s is empty. Hence ($a, $b)[1] is a neat way of providing a default in XPath 2.0, since if $a is empty the first item in $b will be provided.


>Is that about right?

Yup. In both XPath 1.0 and 2.0 the predicate (the thing in [ ]) is described as a "filter expression" which takes a single argument: in 1.0, a node set, in 2.0, a sequence. A numeric filter expression takes the nth item in the sequence (or in 1.0, the set, read in document order) for number n.

This is math only in the general sense, since it's no math you're likely to have learned in school. It's really just a formal notation representing data objects and operations over them.

21.

String Serialisation

David Carlisle


> Am I correct in my understanding that the reason the stylesheet
> below works, separating each item from the <xsl:sequence/> with
> a single whitespace character, is due to
>
>  http://www.w3.org/TR/xslt-xquery-serialization/#serdm

No. the white space is really added to the tree, not just on serialisation (if you save the result into a variable and query into it you will see the spaces are already added even if no serialisation occurs).

see step 5 in w3c

MK expands

That text describes what happens if you serialize a sequence of nodes, which never happens in XSLT - in XSLT you always serialize a single document node. But the serializer spec is mimicking the XSLT specification for what happens when you form the content of a document or text node, which is described in w3c

The important difference is that the spaces are part of the result tree, they are not simply added during serialization.

22.

Unique values in attributes

Michael Kay



> My XSLT 2.0 solution below is working. I am just wondering if
> there is an easier way that could perhaps bring
> distinct-values() into play?
>
> <xsl:variable name="uniqueAttributeNames">
>     <xsl:for-each-group select="//@*" group-by="local-name()">
>         <xsl:sequence select="local-name()"/>
>       </xsl:for-each-group>
> </xsl:variable>
> <xsl:sequence
> select="string-length(replace(replace($uniqueAttributeNames,
> '\c+',  'x'), '\C+', ''))"/>
>

Firstly, if you make uniqueAttributeNames hold a sequence of strings rather than a single string, then you can use the much simpler expression count($uniqueAttributeNames) to count them.

You can do this (a) by adding an "as" clause to the variable declaration

<xsl:variable name="uniqueAttributeNames" as="xs:string*">

or (b) by using distinct-values. So the whole thing is just

<xsl:sequence select="count(distinct-values(//@*/local-name()))"/>

23.

Test against preceding values

Abel Braaksma



> I was wondering about the following: using some/every XPath
> expressions, is it possible to test values in a sequence
> against preceding values?
> The requirement pops up when you want to test pairs of
> values, where, say, the even indexed value must be greater
> than the odd indexed value, etc. I often have this
> requirement, and although it is resolvable with for-each
> and/or nested for-in-return, I was looking for a simpler solution.
>
> Since many people understand the requirements of Fibonacci
> numbers, I think it is a good example use case and represents
> my actual use case pretty well (without the blur):
>
> 0, 1, 1, 2, 3, 5, 8, 13, 21...
>
> Now, for each given Fibonacci number, it is correct if:
>
> F(n) = F(n-1) + F(n-2)
>
> For positions 1 and 2, if the value for F(n-1) and F(n-2) are zero.

The 'every' operator has lower precedence than 'and', requiring parenthesis (otherwise you will receive an error) and the '.. to ..' range should start with 3:

$fib[1] = 0 and $fib[2] = 1 and
   (every $i in 3 to count($fib) satisfies
        $fib[$i] = $fib[$i - 1] + $fib[$i - 2])

This returns true for example for the following Fibonacci sequence: (0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987, 1597, 2584, 4181, 6765, 10946, 17711, 28657, 46368, 75025)

Here's another version, which eases the expression a bit by removing the special cases:

every $i in 1 to count($fib) satisfies
      $fib[$i] = (0, $fib)[$i] + (0, 1, $fib)[$i]

MK suggested

 $fib[1] = 0 and $fib[2] = 1 and
 every $i in 2 to count($fib) satisfies
   $fib[$i] = $fib[$i - 1] + $fib[$i - 2]

24.

Sequences

Colin Adams et al



>The following variable definition is giving me the error
>"XTTE0570: A sequence of more than one item is not allowed as the value of
>variable normsrc"
>
>I'm using Saxon 8.6.1 on Windows XP.
>
>====
><xsl:variable name="normsrc" as="xs:string">
><xsl:choose>
><xsl:when test="contains($xsrc,'/images/images/')">
><xsl:text>images/</xsl:text>
><xsl:value-of select="substring-after($xsrc,'/images/images/')" />
></xsl:when>
><xsl:when test="contains($xsrc,'/images/')">
><xsl:text>images/</xsl:text>
><xsl:value-of select="substring-after($xsrc,'/images/')" />
></xsl:when>
><xsl:otherwise>
><xsl:value-of select="$xsrc" />
></xsl:otherwise>
></xsl:choose>
></xsl:variable>

You are generating more than one string - in fact you are generating text nodes, but these have a string value. I think you want to use xsl:sequence with a concat function in the select attribute.

Mike Kay adds

xsl:text creates a text node, xsl:value-of creates a second text node, and you can't convert a sequence of two text nodes to a single string.

One of the more intricate parts of the XSLT 2.0 spec is the distinction between strings and text nodes, and the question of when text nodes are concatenated implicitly and when they aren't.

A single text node is created whenever you use xsl:text or xsl:value-of (or xsl:number).

The basic rules are:

(a) you can coerce ("atomize") a single text node to a string, but not a sequence of text nodes

(b) text nodes are concatenated only if they are used within a node constructor instruction, for example

<elem>
  <xsl:text>123</xsl:text>
  <xsl:value-of select="456"/>
</elem>

or

<xsl:value-of><xsl:text>123</xsl:text><xsl:value-of
select="456"/></xsl:value-of>

Personally if I'm creating a string then I try to avoid doing it by creating text nodes and then atomizing them. I prefer to create the string directly, in the select expression of xsl:sequence, using concat() if appropriate.

25.

Sequences

Mike Kay

The content of xsl:for-each is an xsl:value-of instruction. xsl:value-of constructs a text node, therefore xsl:for-each constructs several text nodes.

If you use this construct inside another instruction such as a literal result element, then the several text nodes will be concatenated. But used directly within a function, there's no containing instruction to do the concatenation, so the sequence of text nodes is returned as is.

Very often in 2.0 you should be using xsl:sequence rather than xsl:value-of. The difference is that xsl:sequence returns the result of its select expression unchanged, whereas xsl:value-of flattens it into a string and then wraps that string in a text node.

26.

When = is not equal

Abel Braaksma

Note that this is XSLT 2.0 only (comparing sequences):

(1, 2, 3) = (1,5,6)
===> returns true because recursively comparing *all* combinations
returns one match 1=1
(4,2,7) = (1,1,1,5,7,3)
===> returns true because recursively comparing *all* combinations
returns one match 7=7

The odd thing is that != is NOT the inverse of =. In fact, most of the time, the result of comparing sequences is the same as when using '='!!! That is, because the definition says that when recursively comparing *all* combinations, if one pair is unequal, the whole expression is unequal. Hence, all the following return true:

(1, 2, 3) != (1,5,6) (4,2,7) != (1,1,1,5,7,3) (1,2) != (1,2) ===> true, because there is a combination, 1 != 2, which returns true.

There is only one way to make != return false, if all combinations of all items are unequal:

(2,2) != (2,2)
(2,2) != (2,2,2,2,2)
===> false, because any combination left/right will give 2 != 2

The OP wanted to now if all people have the *same* sex. Naively, one could argue that the following:

somenode/sex = somenode/sex

returns true when all sex nodes are the same. Yes. That is correct. But it also returns true when *any* sex node is pairwise equal (see above). It returns true way too often. To change this, your best option is to reverse the expression and negate it (see above: '!=' will only return false when all elements on either side are the equal):

not(somenode/sex != somenode/sex)

returns true only when all sexes are equal.

(ednote. Note that if the path somenode/sex returns more than one item, then the comparison is a set compared to another set, hence this is an example of the type above.

I believe that the reasoning behind this is that it must behave the same as apply such predicates:

group[person = "john"]

will select all groups where *any* person has the name "john". People expect a comparison against each item in the sequence of all 'person' nodes, and not only the first, or the last.

Once you are aware of this, you can stop writing this:

   group[person  = "john" or person = "mary"]

and start writing this instead:

   group[person = ("john", "mary")]

which is different to this:

   group[person  = "john" and person = "mary"]

which will return the same node set (ehrmn, seq. of nodes) but is easier to write. Eventually, it becomes more intuitive...

> The odd thing is that != is NOT the inverse of =.
> Which it is in most languages?
> =   <> not (!=) is plain weird :-)

YES! Which is not only hard for first-timers, but also for others. It becomes inherently more complex if you mix backwards-compatibility mode, which takes the first item of a node instead of a sequence of nodes.

>
>
>>
>> There is only one way to make != return false, if all combinations of
>> all items are unequal:
>> (2,2) != (2,2)
>> (2,2) != (2,2,2,2,2)
>> ===> false, because any combination left/right will give 2 != 2
>
> This kind of example shows the 'complexity' of the issue?

It shows the stupidity and uselessness of the != operator. Which even returns 'false' (should mean 'equal') when sequences on left and right are truly not equal: i.e., there lengths differ.

> Also note that 
>
> somenode/sex = somenode/sex
>
> considers *all* nodes in both paths, so that brings it into the
> problem area of the OP

Consider the following:

<groups>
  <group id="1">
     <person sex="m" name="a" />
     <person sex="m" name="a" />
  </group>
  <group id="2">
     <person sex="f" name="a" />
     <person sex="m" name="a" />
     <person sex="m" name="a" />
  </group>
  <group id="3">
     <person sex="f" name="a" />
     <person sex="f" name="a" />
  </group>
</groups>

Now, you need all group elements where all persons have equal sex. I.e, either all persons in a group are Male or all persons are Female. You are not interested in mixed groups (like as if you are searching boy-only or girl-only schools by researching the gender data of the pupils)

<!-- Method 1: naive, returns 1,2,3 -->
<xsl:copy-of select="group[person/@sex = person/@sex]" />
<!-- Method 2: still naive, you're trying the opposite, just to be sure
it is so: returns nothing -->
<xsl:copy-of select="group[not(person/@sex = person/@sex)]" />
<!-- Method 3: you got lost, again you are trying the opposite. returns:
2 -->
<xsl:copy-of select="group[person/@sex != person/@sex]" />
<!-- Method 4: bingo, returns 1 and 3 -->
<xsl:copy-of select="group[not(person/@sex != person/@sex)]" />

When you are after something like "all normalized values of the sequences left and right must be the same", you will have to use the construct: not($firstvalue != $secondvalue)

You may also want to do something like this, but again, it is naive and brings trouble

<!-- more readable, you think? Will return 1,2,3-->
<xsl:copy-of select="group[every $s in person/@sex satisfies $s =
$s/../@sex]" />
<!-- using 'every' will end up like this -->
<xsl:copy-of select="group[every $s in person/@sex satisfies every $s2
in person/@sex satisfies $s2 = $s]" />

this last version of 'every' does however illustrate what happens 'under the hood' when you do a naive compare of person/@sex = person/@sex. And don't consider 'some' instead, as you end up with the same loop-in-loop problem and it really does not add to readability, I believe.

Using the XML example above, and the following stylesheet shows this.

  <xsl:template match="/">
    <xsl:apply-templates/>
  </xsl:template>

  <xsl:template match="groups">
<!-- Method 1: naive, returns 1,2,3 -->
<method1>
<xsl:value-of select="group[person/@sex = person/@sex]/@id" />  
</method1>
<!-- Method 2: still naive, you're trying the opposite, just to be sure
it is so: returns nothing -->
<method2>
<xsl:value-of select="group[not(person/@sex = person/@sex)]/@id" />
</method2>
<!-- Method 3: you got lost, again you are trying the opposite. returns:
2 -->
<method3>
<xsl:value-of select="group[person/@sex != person/@sex]/@id" />
</method3>
<!-- Method 4: bingo, returns 1 and 3 -->
<method4>
<xsl:value-of select="group[not(person/@sex != person/@sex)]/@id" />
</method4>

<method5>
<xsl:value-of select="group[every $s in person/@sex satisfies $s =
                     $s/../@sex]/@id" />
</method5>

<method6>
<xsl:value-of select="group[every $s in person/@sex satisfies every $s2
                      in person/@sex satisfies $s2 = $s]/@id" />
</method6>
  </xsl:template>