Data Types

1. Data types
2. XSD Builtin simple datatypes
3. Rules for < have changed
4. Type safety in xslt 2
5. Two functions, same number of parameters
6. Default parameter type
7. On xslt 2.0 types
8. Data typing can be useful
9. data() type
10. Understanding the Relationship of Nodes, Sequences, and Trees
11. Typed match
12. getting node type in xsl
13. Test For Numeric Values?
14. testing for string and number
15.
16. Check for integer
17. Co-constraints
18. Convert to a number
19. Types and variables
20. Variables, siblings or orphans

1.

Data types

Michael Kay

The only documentation for these features that's currently available is the XPath 2.0 working draft at W3C

XPath 2.0 changes the type system. Whereas XPath 1.0 allows values of four types (number, string, boolean, or node-set), in XPath 2.0 the value of every expression is a sequence of items (a single item is treated as a sequence of length one). The items in a sequence may be nodes or atomic values, and the atomic values may be any of the simple types of XML Schema, for example boolean, string, decimal, double, date, QName, or anyURI. Node-sets in XPath 1.0 are replaced by node sequences; the difference is that a node sequence may be in any order, not necessarily document order.

2.

XSD Builtin simple datatypes

Michael Kay

Ednote. Although this question relates directly to Saxon, the answers enlighten us on these datatypes. SA is the Schema aware version of Saxon, Saxon-B is the non-Schema aware XSLT processor.


> I've played a little with the XSD Builtin simple datatypes as 
> supported in Saxon 8.4 and found the following:

>   1. The majority of the derived types produce a warning that they 
> will only be supported in the SA processor.

The XSLT 2.0 specification states that a "Basic XSLT Processor" supports only the primitive types plus xs:integer. (There's been some confusion over this, but that's the current situation.) It's my intention that Saxon-B "out-of-the-box" should conform at this conformance level. This is why I put the warnings into the current release to advise users when they are relying on facilities that are only available with the "schema-aware" conformance level.

The actual code for the derived types will remain present in the open source product (it's needed for XQuery, and once code is released as open source, all subsequent modifications have to be published). I may therefore decide to offer a switch that enables these types in Saxon-B. There's an escape clause in the conformance rules that probably permits this:

"An atomic value may also belong to an implementation-defined type that has been added to the context for use with extension functions or extension instructions."

But this is not really within the spirit of the rules, so the derived types will (at some stage) be disabled by default, in the interests of interoperability. This is assuming, of course, that there are no further changes to the spec.



>   2. The following datatypes are unknown and produce an error message:

>      xs:ENTITIES

>      xs:IDREFS

>      xs:NMTOKENS

These are list types, and as such they can be used only as type annotations on nodes, not as the type of an XPath value. (If a node is annotated as having type xs:NMTOKENS, then the result of atomizing the node is a value of type xs:NMTOKEN*) Saxon-B doesn't allow type annotations on nodes, so it doesn't support these types.

The distinction between "schema types" and "sequence types" is not always well understood, even by WG members, and it is confusingly explained in the specs. There are basically two type hierarchies.

* Schema types are types as defined in XML Schema: they divide in to simple types and complex types, and simple types further divide into union types, list types, and atomic types. Schema types appear in XSLT/XPath in the form of type annotations on nodes.

* Sequence types are the types of XPath values: a sequence type consists (usually) of an item type and a cardinality. The item types divide into atomic types and node types.

Atomic types are common to the two hierarchies, and the node types element(N,T) and attribute (N,T) may reference a schema type to identify the type annotation appearing on the node. So a type such as xs:IDREFS may appear in XSLT/XPath only in a construct such as attribute(*, xs:IDREFS). For a longer explanation, see the "stylesheets and schemas" chapter of my book.

3.

Rules for < have changed

Mike Kay

In the current draft XPath 2.0 specs, the rules for "<" and ">" have changed when the arguments are strings (or untyped nodes).

In XPath 1.0, both operands were converted to numbers, and were compared numerically. Note this is different from "=", where they are compared as strings. In XPath 1.0, "1" = "1.0" is false, but "1" <= "1.0" is true.

In XPath 2.0 WD, a numeric comparison happens if one or both operands is a number, but if both are strings (or untyped nodes), you get an alphabetic comparison (using the default collating sequence). This means that "10" < "2" (both operands strings) will be true. The rules are also generalized to allow comparison of other types such as dates and times.

So you need to rewrite the expression by wrapping one or both operands in the number() function. Or better, initialize the $lev attribute so its type is numeric.

4.

Type safety in xslt 2

Jeni Tennison

Here's an example that illustrates what happens when you multiply two values under XSLT 2.0.

Say you had the following unvalidated (and hence untyped) XML:

  <problem risk="3" severity="4">...</problem>

and you wanted to create a number of exclamation marks equal to the value of @risk * @severity. @risk and @severity are both always integers, but the XML is untyped so the XSLT processor doesn't know this.

Since neither @risk nor @severity is typed, an XSLT 2.0 processor will assume that since you want to multiply them together they must be of type xs:double. When you multiply two doubles, the result is a value of type xs:double. If you try to use a double as an argument or operand to a function or operator that expects an integer (or indeed most other types), you will get a type error. For example, if you try to do:

  <xsl:value-of select="string-pad('!', @risk * @severity)" />

or:

  <xsl:for-each select="1 to @risk * @severity">!</xsl:for-each>

then you will get errors because string-pad() expects an integer as its second argument and the 'to' operator expects integers for its arguments.

You have to do one of:

- add types to your document by validating it against a schema or by creating a temporary tree in which the attributes are assigned types
- cast the values to integers explicitly within the code
- use a variable to create an untyped node containing the value which will then be automatically cast to the right type

Explicit casts look like:

  <xsl:value-of select="string-pad('!', xs:integer(@risk * @severity))" />

Creating the variable looks like:

  <xsl:variable name="danger">
    <xsl:value-of select="@risk * @severity" />
  </xsl:variable>
  <xsl:value-of select="string-pad('!', $danger)" />

This latter works because the $danger variable holds the document node of a tree that contains the value of @risk * @severity. The typed value of the document node is the value 7 of the type xdt:untypedAtomic. Since the type is xdt:untypedAtomic, the value is cast automatically to the required type of xs:integer when the variable is used.

On a related subject,


  > Just to help clarify, then, if I have an expression in XSLT 2.0 like
  > this: 4.5 + 5.5
  >
  > it should yield 10.0 in XPath 2.0, right?
  >
  > This would have yielded 10 in XSLT 1.0.

Well, if you had the literals 4.5 and 5.5 they would both be interpreted as values of the type xs:decimal. If you add two decimals together, you get another decimal as a result, so the result of the expression "4.5 + 5.5" is the decimal value 10.0.

On the other hand, if you had two untyped attributes, one with the value "4.5" and the other with the value "5.5" then when you added them together they'd be converted to doubles and the result would be the double value 10E0.

If you use <xsl:value-of> to convert either a decimal or a double to a string then the way it gets serialised now depends on the value of the decimal or double. This isn't yet in the public drafts, but in Saxon's implementation the idea is that if there aren't any significant digits after the decimal point then it is serialised as an integer, so that you get "10". (Also, I think that if it's a fairly small or very large double then an appropriate exponent will be used; I couldn't find these large/small numbers in testing with Saxon 7.4 so either Mike hasn't implemented that or I'm misremembering.)

  
  > If I am using an XSLT 2.0 compliant processor, and I want to
  > maintain my original results, is the solution as you described to
  > Gunther, to keep the version attribute value at "1.0" and that
  > otherwise XPath 2.0 casting takes over and therefore the result will
  > be an xs:decimal type? But I'll still have the benefit of using, for
  > example, xsl:for-each-group?

Setting the XPath 1.0 compatibility mode on using version="1.0" should mean that any XPath 1.0 expression will give the same results as it used to. The places where it doesn't are listed in Appendix F of XPath 2.0 at: xpath 2

If you find something that isn't listed there, you should let the WG know by writing to public-qt-comments@w3.org.

Using version="1.0" with XPath 2.0 expressions gives you:

- First item semantics when passing a sequence to a function that expects a single item. For example, if you pass a function that expects a single node a sequence of five nodes then it will pick the first one whereas under XPath 2.0 it will give an error.

- Automatic conversion to a string when passing a value to a function that expects a string. For example, if you pass substring() the current-dateTime() as the first argument then it will be converted to a string in backwards compatible mode whereas under XPath 2.0 it will give an error.

- Automatic conversion to a double when passing a value to a function that expects a double.

- Automatic conversion to a double of operands in arithmetic expressions. For example if you try to subtract one xs:date from another then you'll get the double NaN under the backwards compatibility rules whereas under XPath 2.0 you will get the xdt:dayTimeDuration between the two dates.

- Automatic conversion to a double of the items in operands in general comparisons when either operand sequence contains a numeric value. For example, 1 = '1' should, I think, be true in backwards compatibility mode as it is in XPath 1.0, whereas in XPath 2.0 it will give an error because integers cannot be compared to strings.

Using version="1.0" with an XSLT 2.0 processor does not prevent you from using XSLT 2.0 and XPath 2.0 constructs such as conditional expressions and <xsl:for-each-group>. Note that this means that if you want to make sure that a stylesheet works under XSLT 1.0, you have to test it with an XSLT 1.0 processor rather than an XSLT 2.0 processor.

5.

Two functions, same number of parameters

Jeni Tennison

This prevents you from defining two functions with the same name and number of arguments. If you have a stylesheet with the two function definitions as above, you will get an error.

What you *can* do is have the function accept a very general type and then have internal tests that determine the behaviour based on the type of the argument. For example:

<xsl:function name="ol:func">
  <xsl:param name="arg1" as="xdt:anyAtomicType" />
  <xsl:choose>
    <xsl:when test="$arg1 instance of xs:integer">
      <h1>This is an integer</h1>
    </xsl:when>
    <xsl:when test="$arg2 instance of xs:string">
      <h1>This is a string</h1>
    </xsl:when>
    <xsl:otherwise>
      <xsl:message terminate="yes">
        ol:func() expects an integer or a string
      </xsl:message>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

Of course this way you won't get any static error checking to tell you that the type of the argument you're passing to the function is wrong...

6.

Default parameter type

Mike Kay




> <xsl:function name="njp:forwardURL">
> <xsl:param name="placeholder"/>

> I _think_ that makes this param a string 

No, the default type is "item()*" which accepts anything (any sequence of items). The default value of this param is a zero-length string, but the supplied value can be anything you like.

7.

On xslt 2.0 types

Jeni Tennison and Mike Kay


>   Its the 'as' attribute on xsl:variable that determines it to be
> an 'element' in this case?

Yes, or at least partly. If you have an 'as' attribute on <xsl:variable> then it indicates the static type of the variable, but the variable itself can be of a more specific type.

The "static type" is the type that the XSLT processor knows about when it first goes through the styelsheet, or an XSLT editor might be able to use: even without having an XML document to work on, the 'as' attribute tells you what type of value the variable must hold. For example, if I have:

  <xsl:variable name="foo" as="item()*">...</xsl:variable>

then the static type, indicated by the 'as' attribute, is any number of any kind of item -- in other words, the variable could hold anything at all.

If I have:

  <xsl:variable name="foo" as="xs:decimal" select="..." />

then the static type, indicated by the 'as' attribute, is a decimal number.

Contrast this with the "dynamic type", which is the type of the actual value of the variable when you actually do the transformation on a particular XML document. For example, if you have:

  <xsl:variable name="foo" as="item()*">
    <a /><b /><c />
  </xsl:variable>

then the static type is any number of any item but the value of the variable is a sequence of three elements.

Similarly, if you have:

  <xsl:variable name="foo" as="xs:decimal" select="2" />

then the static type is a xs:decimal but the type of the variable's value is a xs:integer.

As long as the dynamic type (the type of the value that the variable gets set to) is a subset of the static type (the type of the variable as declared by the 'as' attribute), you're OK.


> Ok. So (and I think I've asked this before and Mike quoted ?I think?
> xsd rec) where is the full list of ' data types' in the current raft
> of specs?
>
> http://www.dpawson.co.uk/xsl/rev2/exampler.html#iof
> lists the ones I guessed at.
>
> I couldn't find one, so I guess as with Dimitres 'misnaming',
> I (we?) will be fumbling with data types until we've guessed them all?

The SequenceType syntax is specified in the XPath 2.0 spec at W3C

It's impossible to give a complete list of sequence types because user-defined data types can be imported into a stylesheet from a schema, and of course there are infinite numbers of element and attribute names.

I am trying to group the possible sequence types for explanatory purposes, simply to make the list easier to understand. If you'd prefer a flat list, just look at the things in quotes.

However, to summarize, a SequenceType can be:


  - "empty()"
  - an item type with an occurrence indicator; item types are:
    - "item()"
    - a node kind test, which are:
      - "node()"
      - a document node test, which can be:
        - "document()"
        - "document(element())"
        - "document(element(Name))"
        - "document(element(Name, *))"
        - "document(element(Name, Type))"
        - "document(element(*, Type))"
        - "document(element(SchemaPath))"
      - an element node test, which can be:
        - "element()"
        - "element(Name)"
        - "element(Name, *)"
        - "element(Name, Type)"
        - "element(*, Type)"
        - "element(SchemaPath)"
      - an attribute node test, which can be:
        - "attribute()"
        - "attribute(Name)"
        - "attribute(Name, *)"
        - "attribute(Name, Type)"
        - "attribute(*, Type)"
        - "attribute(SchemaPath)"
      - a processing instruction node test, which can be:
        - "processing-instruction()"
        - "processing-instruction(Name)"
      - "comment()"
      - "text()"
    - an atomic type, which can be any QName; common ones are:
      - "xdt:anyAtomicType"
      - "xs:string"
      - "xs:decimal"
      - "xs:double"
      - "xs:integer"
      - "xs:boolean"
      - "xs:date"
      - "xs:time"
      - "xs:dateTime"
      - "xdt:dayTimeDuration"
      - "xdt:yearMonthDuration"


 >     E.g. you had 
 >              "a node()
 >                 "element()
 >                    "element(Name, Type)
 >
 > Can I read that as element is a subtype of node,
 > No, you can't read the list like that. But it is true that an element
 > node would match a SequenceType of "node()".

Just because document-node() comes earlier in the list than element() doesn't mean that you can provide an element where a document-node() is expected.

For a full list of the built-in atomic data types, look in the F&O spec at: W3C and: W3C

As I said, these atomic data types can be augmented with ones that you define yourself, within a schema, so it's not possible to generate an exhaustive list.

> Yes, you're probably right Jeni... but I'm not surprised,
> its got to be closer to xsl-fo complexity than xslt 1?

There's certainly more to learn in XSLT 2.0 than there was in XSLT 1.0, which isn't surprising considering that it can do that much more.

The 'as' attribute sets limits on what the content or select attribute of the <xsl:variable> must evaluate to, but as long as evaluating the content (or select attribute) is within those limits, it can vary.


>> Similarly, if you have:
>> 
>>   <xsl:variable name="foo" as="xs:decimal" select="2" />
>> 
>> then the static type is a xs:decimal but the type of the variable's
>> value is a xs:integer.
>
> ?? Dynamic is integer, static is decimal??

Yes. The value returned from evaluating the XPath expression "2" is an xs:integer with the value 2. The 'as' attribute of the <xsl:variable> element says that the type of the variable is a xs:decimal. So the static type is xs:decimal, the dynamic type is xs:integer.


> Assuming such a cast is viable/allowed, any use of $foo will be as a
> decimal thereafter in the stylesheet?

The value of the variable is an xs:integer, and it will remain so: the xs:integer isn't cast to a xs:decimal because an xs:integer can always be used where a xs:decimal is expected because xs:integer is a subtype of xs:decimal.


>> As long as the dynamic type (the type of the value that the
>> variable gets set to) is a subset of the static type (the type of
>> the variable as declared by the 'as' attribute), you're OK.
>
> 'Can be cast to' I guess.

Well, "matches" would be the correct terminology, I guess. I should rephrase to:

"As long as the value the variable gets set to matche the static type, you're OK."

Whether a value "matches" a particular SequenceType is determined according to the rules in XPath 2.0 at: W3C

"Can be cast to" isn't the correct terminology because the only casting that's supported in XPath is the casting of a single atomic value to a different atomic type. So for example "a sequence of three elements" can't be "cast to" the SequenceType "element()+", but it *matches* the SequenceType "element()+".

>> The SequenceType syntax is specified in the XPath 2.0 spec at:
>> 
>>   http://www.w3.org/TR/xpath20/#id-sequencetype
>> 
>> It's impossible to give a complete list of sequence types
>
> Is there a difference (to a user) between sequence types and data
> types?

The term "data type" is usually used to refer to atomic data types such as xs:decimal, xs:date and so on. A "sequence type" refers to the type of a sequence, such as "one or more elements". So yes, there is a meaningful difference.


> The less said about this bunch, the better IMO.
> I'm hoping that the request to castr... reduce to a minimum will be
> supported in the WG :-)

In basic XSLT processors, we might well only require support for atomic values with a subset of the data types, probably:


  - xs:string
  - xs:boolean
  - xs:decimal
  - xs:integer
  - xs:double
  - xs:date
  - xs:time
  - xs:dateTime
  - xs:QName
  - xs:anyURI
  - xdt:dayTimeDuration
  - xdt:yearMonthDuration
  - xdt:untypedAtomic

When declaring the type of a variable (i.e. in a SequenceType), you will also be able to refer to the more general type xdt:anyAtomicType.

If you're using a full schema-aware XSLT processor, then you'll be able to use all the built-in types from XML Schema and XPath 2.0, as well as the ones that you import from a schema yourself.


>> So, you *can* use the 'as' attribute to do implicit casting.
> But only between a subtype and super of that subtype?

No. There is never a need to cast a value from a subtype to a supertype, because a value of the subtype is *by*definition* a value of the supertype already. For example, the xs:integer 2 is a xs:decimal value.

The implicit casting is mainly used when casting from an untyped value (of a node) to a particular type. For example, casting the value of the 'dob' attribute to a xs:date. In this case, the type of the value of the node (xdt:untypedAtomic) is not a subtype of the type to which it's being cast (xs:date).

(The other time it's used is when promoting values from xs:decimal to xs:float or xs:double, and from xs:float to xs:double.)


>> >     the last line I'm less sure of.
>
> I meant the line "                   "element(Name, Type)    "

Ah. Basically, "element(Name, Type)" only matches elements that are called "Name" and have a type "Type". For example, "element(Start, xs:dateTime)" will match elements called <Start> with a type of xs:dateTime. (It's a bit more complicated than that, because of element substitution groups, but I won't go into that because you've said you don't care about schema-aware processing.) The type is assigned when the element is validated against a schema (which it might be when it's generated using XSLT).

In basic XSLT, all elements have the type xdt:untypedAny (xs:anyType in the current specs, but I think that's going to change), so if you're using basic XSLT then you don't have to worry about this kind of test. The only element node tests you will be interested in are:

  - element()
  - element(Name, *)

which match all elements, and all elements with a particular name, respectively.


> Which raises another question.
>   I can see how the simpler ones from your list might be used
> as a value for the as attribute,
> <xsl:variable name="fred" as="xs:integer">
>
> but how might element(elementName, xs:integer)  be used?

If you wanted to say that the variable holds an element called "elementName" whose type is xs:integer. For example, when setting the variable with:

  <xsl:variable name="fred" as="element(elementName, xs:integer)">
    <elementName xsl:type="xs:integer">4</elementName>
  </xsl:variable>

Note that SequenceTypes are used in other places as well as in the 'as' attribute on variable-binding elements. For example, you might want to say that a function returns a sequence of <value> elements of type xs:integer:

<xsl:function name="my:get-values" as="element(value, xs:integer)">
  ...
</xsl:function>

And note that this currently only applies in schema-aware XSLT processors. If you try to use a SequenceType like this in a basic XSLT processor, you will get an error.

Mike Kay also responded:

"as" on xsl:variable is an assertion about the type. For example, if you say

<xsl:variable name="x" as="xs:integer" select="my:prime-number()"/>

then you are asserting that the function my:prime-number() will return an integer, or a value that can be (loosely-speaking) treated as an integer. If it returns an xs:unsignedInteger, your assertion is correct, because xs:unsignedInteger is a subtype of xs:integer. If it returns an attribute node that contains an integer, or that contains an untyped value that can be read as an integer, then you're also OK. But if the function returns a string or a date, then you'll get an error. The system can report this error at compile time if it can detect it then, otherwise it will be a run-time error.

With a simpler case such as


<xsl:variable name="x" as="xs:integer" select="'Bangkok'"/>

one would hope that most systems will report the error at compile time, but the rules don't require this. (This is because we haven't tried to define the concept of a "constant expression").

 > Similarly, if you have:
 > 
 >   <xsl:variable name="foo" as="xs:decimal" select="2" />
 > 
 > then the static type is a xs:decimal but the type of the variable's 
 > value is a xs:integer.
 
> ?? Dynamic is integer, static is decimal??

> Assuming such a cast is viable/allowed, any use of $foo
> will be as a decimal thereafter in the stylesheet?

You don't get casting here, only the weaker kind of conversion allowed in function call and assignment contexts. This allows (a) extraction of the content of a node, (b) numeric promotion (e.g. integer to decimal, but not decimal to integer), and (c) casting of untyped values only.

The static type of $foo is xs:decimal, but the dynamic type of its value is integer. Static types aren't likely to affect XSLT processors significantly; they are much more important for some XQuery processors (such as Microsoft's) which are proposing to implement "conservative" static typing: which means the static type of an expression has to be right for the context where it is used, not just the dynamic type of its value. This might sound complicated but it's exactly what happens in many programming languages like Java: it's an error to write

Node n = xyz.getElement();
abc.setElement(n);

if setElement expects an Element; you have to write

abc.setElement((Element)n);

This kind of cast is written "treat as" in XPath/XQuery (because SQL uses "cast" to mean something different).

8.

Data typing can be useful

Dimitre Novatchev.

In XSLT, as in other programming languages, typing can be very useful.

I personally have already benefitted by XSLT 2.0 typing. In particular, it allows me to eliminate code, which in XSLT 1.0 was necessary, e.g. to check if a node-set passed as parameter is empty and then issue an error message:

Instead of:

<xsl:template name="foldl1">
  <xsl:param name="pFunc" select="/.."/>
  <xsl:param name="pList" select="/.."/>

  <xsl:choose>
     <xsl:when test="not($pList)">
      <xsl:message terminate="yes">Some strong words!!!</xsl:message>
     </xsl:when>
     <xsl:otherwise>useful code here</xsl:otherwise>

I now can simply write:

<xsl:function name="f:foldl1">
  <xsl:param name="pFunc" as="element()"/>
  <xsl:param name="pList" as="item()+"/>

  <!-- useful code here -->

This is a significant reduction of the complexity of the code and the total number of lines. As result the code is simpler, more easier to write and understand, programmer productivity is increased.

9.

data() type

Michael Kay


> How can a sequence of anything be an instance of xs:boolean Mike?

In the XPath data model true() is both a boolean and a sequence containing a single boolean. There is no distinction between an item and a sequence of length one containing that item.

This reflects the way list-valued attributes work in XML Schema (and in DTDs): you wouldn't expect the attribute value "red" to behave differently when you change the type from NMTOKEN to NMTOKENS.

 
> <xsl:if test=". instance of xs:boolean">

> I'd interpret that as 'the context node value is a boolean'. 
> What's the difference between . and data(.) please?

data(.) forces atomization (i.e. extracting the value of a node). If X is an element then it cannot be a boolean, but its content can be a boolean. Many operators such as "+" and "=" force atomization of their operands, but some, like count() and "instance of", do not. For example with an NMTOKENS attribute a="red green blue", count(@a) is 1 but count(data(@a)) is 3.



> Are  you are assuming the context is a sequence of one item?
>  

There is a thing called the "context item" which is either a single item (=a sequence of one item) or is undefined (loosely, null).

10.

Understanding the Relationship of Nodes, Sequences, and Trees

Roger Costello

I have been having some excellent exchanges with Michael Kay and have learned a lot. I thought that I would summarize what I learned, so that others can benefit as well.

Understanding the Relationship of Nodes, Sequences, and Trees

(1) A node can belong to only one tree.
(2) A node may belong to any number of sequences.
(3) Axes always apply to the tree that the node is in. Axes never apply to the sequence that the node is in.
(4) When xsl:sequence is used in a sequence which has no parent node then the sequence contains the original node referenced by xsl:sequence and not a copy.
(5) When xsl:sequence is used in a sequence which has a parent node then the element that is referenced by xsl:sequence is copied. Thus, the sequence is comprised of a copy and not the original.
(6) The preceding-sibling and following-sibling axes can only be used in a tree. That is, they cannot be used in a sequence that does not have a parent node. (Nodes are "siblings" iff they have a common ancestor)

To understand these rules, let's consider an example.

Below is the XML document that my stylesheet operates upon:

<?xml version="1.0"?>
<FitnessCenter>
    <Member>
        <Name>Jeff</Name>
    </Member>
    <Member>
        <Name>David</Name>
    </Member>
    <Member>
        <Name>Roger</Name>
    </Member>
</FitnessCenter>

In my stylesheet I have created this variable:

<xsl:variable name="members" as="element()+">
    <xsl:sequence select="/FitnessCenter/Member[2]"/>
    <Member>
        <Name>Sally</Name>
    </Member>    
    <Member>
        <Name>Linda</Name>
    </Member>    
</xsl:variable>

Note that this variable contains a mix of elements - the first element (the David Member) comes from the FitnessCenter. The second and third elements (Stacey and Linda) are defined within the variable itself.

Further, note that this sequence does not have a parent node (due to the presence of as="element()+".

A characteristic of xsl:sequence when used in a sequence that does not have a parent node is that it does not create a copy of the node that it references; instead, it uses the original node. Thus, $members[1] is referencing the original node:

    <Member>
        <Name>David</Name>
    </Member>

from the FitnessCenter.

Now let's consider the above rules in the context of this example.

(1) A node can belong to only one tree.

$members[1] references this node:

    <Member>
        <Name>David</Name>
    </Member>

This node belongs to the FitnessCenter tree.

$members[2] references this node:

    <Member>
        <Name>Stacey</Name>
    </Member>

This node belongs to the $members sequence. ($members does not create a tree. It is only creating a sequence of nodes.)

(2) A node may belong to any number of sequences.

This node:

    <Member>
        <Name>David</Name>
    </Member>

belongs to both the FitnessCenter sequence as well as the $members sequence.

(3) Axes always apply to the tree that the node is in. Axes never apply to the sequence that the node is in.

Consider this XSLT statement which uses the preceding-sibling axis:

<xsl:copy-of select="$members[1]/preceding-sibling::*[1]"/>

$members[1] references the David node, which is in the FitnessCenter tree. Therefore, it is referencing David's preceding-sibling in the FitnessCenter tree:

    <Member>
        <Name>Jeff</Name>
    </Member>

Likewise, this is referencing David's following-sibling in the FitnessCenter tree:

<xsl:copy-of select="$members[1]/following-sibling::*[1]"/>

Output:

    <Member>
        <Name>Roger</Name>
    </Member>

Note that you cannot use preceding-sibling nor following-sibling on $member[2] or $member[3] because these axes only apply to nodes in a tree. The Stacey Member and Linda Member are not in a tree - they are only in a sequence.

(4) When xsl:sequence is used in a sequence which has no parent node then the sequence contains the original node referenced by xsl:sequence and not a copy.

Consider this variable declaration:

<xsl:variable name="members" as="element()+">
    <xsl:sequence select="/FitnessCenter/Member[2]"/>
    <Member>
        <Name>Sally</Name>
    </Member>    
    <Member>
        <Name>Linda</Name>
    </Member>    
</xsl:variable>

This variable is comprised of a sequence of nodes. The sequence does not have a parent node. Therefore, the sequence is comprised of the original node.

(5) When xsl:sequence is used in a sequence which has a parent node then the element that is referenced by xsl:sequence is copied. Thus, the sequence is comprised of a copy and not the original.

Consider this variable declaration:

<xsl:variable name="members">
    <xsl:sequence select="/FitnessCenter/Member[2]"/>
    <Member>
        <Name>Sally</Name>
    </Member>    
    <Member>
        <Name>Linda</Name>
    </Member>    
</xsl:variable>

Note the absence of as="element()+". Thus, this Member sequence has a document node as its parent. Consequently, a *copy* of /FitnessCenter/Member[2] is made and used in the sequence. So this statement produces an empty output:

<xsl:copy-of select="$members[1]/preceding-sibling::*[1]"/>

(6) The preceding-sibling and following-sibling axes can only be used in a tree. That is, they cannot be used in a sequence that does not have a parent node. (Nodes are "siblings" iff they have a common parent)

Therefore, for example, you cannot use following-sibling to get the member that follows Sally:

<xsl:variable name="members" as="element()+">
    <xsl:sequence select="/FitnessCenter/Member[2]"/>
    <Member>
        <Name>Sally</Name>
    </Member>    
    <Member>
        <Name>Linda</Name>
    </Member>    
</xsl:variable>

This statement produces an empty output:

<xsl:copy-of select="$members[2]/following-sibling::*[1]"/>

However, in this version the sequence does have a parent (document) node, so you can use following-sibling to retrieve the member that follows Sally:

<xsl:variable name="members">
    <xsl:sequence select="/FitnessCenter/Member[2]"/>
    <Member>
        <Name>Sally</Name>
    </Member>    
    <Member>
        <Name>Linda</Name>
    </Member>    
</xsl:variable>

This statement:

<xsl:copy-of select="$members[2]/following-sibling::*[1]"/>

yields this output:

    <Member>
        <Name>Linda</Name>
    </Member> 

An alternate form that will also work is this:

<xsl:variable name="members" as="element()+">
    <Members>
          <xsl:sequence select="/FitnessCenter/Member[2]"/>
          <Member>
              <Name>Sally</Name>
          </Member>    
          <Member>
              <Name>Linda</Name>
          </Member>    
    </Members>
</xsl:variable>

Note that the member sequence has a parent node (<Members>). Therefore, the following-sibling and preceding-sibling axes can be used on the member sequence, e.g.,

This statement:

<xsl:copy-of select="$members/Member[2]/following-sibling::*[1]"/>

yields this output:

    <Member>
        <Name>Linda</Name>
    </Member> 

11.

Typed match

Michael Kay



>     <xsl:template match="element(*, my:postal-address-type)">

> What does this match on please?
>   An element,
>   any name ????
>   with type (from some schema) my:postal-address-type

Yes:


   element() means it must be an element node

   element(*) means it can have any name

element(*, my:postal-address-type) means that it must be an element that has been validated by a schema processor as conforming to the the global schema-defined type my:postal-address-type, which must be defined in a schema that has been imported using <xsl:import-schema>. This can be either a simple type or a complex type.

Some other variations:


match="element(X)" means the same as match="X" (not very useful in this
context)

<xsl:variable name="v" as="element(X)">

declares a variable that will always contain an element named X

match="element(billing-address, my:postal-address-type)"

matches an element whose name is "billing-address" that has been validated against a particular schema type

match="schema-element(my:billing-address)" matches an element that's been validated against a global element declaration called "my:billing-address" in an imported schema; the element needn't actually have this name, it could be a member of the substitution group.

12.

getting node type in xsl

David Carlisle

An alternative that would work is to pre-process your schema using xslt to produce an xslt file that "knows" the element types. If your schema is highly complex this might be difficult but for many schema documents it is relatively easy to write an xslt file that inputs a schema and (say) collects all the element names that have type xsd:integer and outputs

<xsl:template mode="type" match="elem1|elem2|....|last-element">
 <xsl:text>integer</xsl:text>
</xsl:template>

then, having produced this xsl file you can import that into your main xslt file and whenever you need to know the type of an element just apply this type mode to get the type name of the current element.

<xsl:variable name="type">
 <xsl:apply-templates mode="type" select="."/> </xsl:variable> <xsl:if
test="$type='integer'">
  do something about integers...

13.

Test For Numeric Values?

Michael Kay

> I want to test a node value to see whether 
> it is numeric or not? (Decides HTML align formatting)

 ". castable as xs:decimal" - or xs:integer, xs:double etc if
preferred)


> but doesnt this scenario sort of 'scream out' for a boolean function 
> that tests for type, both basic (node, string, number) and schema ?

There is such a construct: e.g. ($x instance of xs:decimal)

But in this case, I interpreted the requirement as being not to test whether the attribute had a numeric type (ie. was defined as a number in the schema), but whether its value had the lexical form of a number. The XPath 2.0 construct for this is ($x castable as xs:decimal).

14.

testing for string and number

Michael Kay




> take for example xsl:sort

> <xsl:sort select="." data-type="number"/>

> what does 'number' mean here?

It's retained for backwards compatibility with XSLT 1.0; the "native" way of doing this in 2.0 would be

<xsl:sort select="xs:double(.)"/>

if they are doubles, or more likely

<xsl:sort select="xs:integer(.)"/>

if they are integers.

15.

Michael Kay


> <?xml version="1.0" encoding="UTF-8"?> 
><example>
>     <test>132131</test>
> </example>

I'll assume there is no schema, that is, this is an untyped/unvalidated document.

>     <xsl:variable name="x" select="example/test" as="xs:integer"/>
>     <xsl:variable name="y" select="example/test"/>

>         y variable value(so we know we are selecting it): 132131
>         Test y as string:false
>         Test y as integer:false
>         x variable value:132131
>         test x as integer: true

> not sure if this is what I would expect normally, the issue is related 
> to an element if if has no explicitly declared data-type..

I don't know what your expectations are but these results are correct according to the spec. If you don't validate the input document against a schema, then the "typed value" of its nodes is untypedAtomic. If you test ($y instance of xdt:untypedAtomic) you will get the answer true. untypedAtomic behaves essentially like XSLT 1.0 - if you use the value where a string is expected, it's treated as a string, if you use it where an integer is expected, it's converted to an integer.


> it doesnt make much sense to me to *have* to declare something as an 
> integer datatype to test if its a value is a number...whats the point?

You need to distinguish "instance of" and "castable as". The "instance of" operator is useful if you write a function that can accept arguments of several different types and you want to test which type you have been given (just like "instanceof" in Java). The "castable as" operator is useful when you are given untyped data and you want to see whether its lexical form makes it suitable for casting to a particular type such as xs:integer or xs:date - which is where this thread started.

16.

Check for integer

Mike Kay



I want to check the Altitude's value, to see if it
is an integer.

<Flight xmlns="http://www.aviation.org";>
   <Aircraft>
       <Altitude>3300</Altitude>
   </Aircraft>
</Flight>

My stylesheet uses this statement:

  <xsl:value-of select="data(flt:Aircraft/flt:Altitude) instance of
xsd:integer"/>

The output I get is: "false"

(The output I seek is "true", as the Altitude element does have an integer value.)

Can someone tell me the correct way to do this?

If there's no schema, then the Altitude element is untyped, so applying data() to it gives an instance of xs:untypedAtomic, not an integer.

Answer. The expression you want is

flt:Aircraft/flt:Altitude castable as xsd:integer

which tests not whether the value is an integer, but whether conversion to an integer would succeed.

You could also test this with a regular expression

matches(flt:Aircraft/flt:Altitude, '[0-9]+')

Abel points out

And be aware, it does not check for an xs:integer, it only checks for the existence of one or more digits inside an item. Matching are: '123ABC', 'ABC123', 'ABC1ZYX' etc. To match only digits, you must supply it with start/end matches, like so:

matches(flt:Aircraft/flt:Altitude, '^\s*[+-]?\d+\s*$'')

Furthermore, it does not do the same as 'castable as'. Because a string like '1E10' is an xs:double which is castable as xs:integer. To make matters worse, the xs:string containing '1E10' cannot be cast to xs:integer directly (meaning 'castable as' would return false), it must first be converted to xs:double. Since you can only use matches() on strings, stuff like this cannot be mimicked with it.

What about numeric values expressed in exponentional notation? These are (with a side step to xs:double) easily castable as integer. Of course, that should only apply to values that have no decimals (not sure of reqs). If you need it, you can expand your expression so:

matches(flt:Aircraft/flt:Altitude, '^\s*[+-]?\d+([eE]+?\d+)?\s*$')

Abel Braaksma

17.

Co-constraints

Roger Costello

Schematron + xPath 2.0 is extremely powerful. In fact, one could argue that it can do everything that XML Schema (or RelaxNG) can do, plus a lot more. =20

For example, below is an XML document showing information about an aircraft and vertical obstructions on its flight path. One critical operational constraint is:

"Check that the aircraft's altitude is at least 500 feet above all the vertical obstructions"

This "co-constraint" cannot be expressed using XML Schemas (or RelaxNG).

But with Schematron + xPath 2.0 the co-constraint can be expressed using this xPath:

every $j in flt:VerticalObstruction satisfies if ($j/flt:Height) then
number(flt:Aircraft/flt:Altitude) gt number($j/flt:Height +
$j/flt:Elevation + 500) else number(flt:Aircraft/flt:Altitude) gt
number($j/flt:Elevation + 500)

As best I can tell, the functionality of Schematron + xPath 2.0 is a superset of XML Schemas (and RelaxNG). However, I am still researching this. The findings in this discussion will be incorporated into a paper I am writing.

I appreciate all your input.  /Roger
<?xml version=3D"1.0"?>
<Flight xmlns=3D"http://www.aviation.org";>
   <Aircraft type=3D"Boeing 747">
       <Altitude units=3D"feet" reference=3D"MSL">3300</Altitude>
       <Location>
           <Latitude>42.371</Latitude>
           <Longitude>-71.000</Longitude>
       </Location>
   </Aircraft>
   <VerticalObstruction type=3D"tower">
       <!-- The top of the tower is 1500 feet -->
       <Elevation units=3D"feet">1000</Elevation>
       <Height units=3D"feet">500</Height>
       <Location>
           <Latitude>42.371</Latitude>
           <Longitude>-71.025</Longitude>
       </Location>
   </VerticalObstruction>
   <VerticalObstruction type=3D"mountain">
       <Elevation units=3D"feet">2600</Elevation>
       <Location>
           <Latitude>42.371</Latitude>
           <Longitude>-71.155</Longitude>
       </Location>
   </VerticalObstruction>
   <VerticalObstruction type=3D"building">
       <!-- The top of the building is 700 feet -->
       <Elevation units=3D"feet">500</Elevation>
       <Height units=3D"feet">200</Height>
       <Location>
           <Latitude>42.371</Latitude>
           <Longitude>-71.299</Longitude>
       </Location>
   </VerticalObstruction>
</Flight>

18.

Convert to a number

Abel Braaksma


> Is something like this possible?:
>
> //Node[xs:int(@number-att)=$myNum]
>
> My problem is that I don't know if @number-att will be padded with zeros
> or not

you mean, I think, xs:integer(@number-att), which is indeed possible. It will fail with an error if the @number-att contains any [^0-9.+-] (with some exceptions). However, there are several ways to prevent this (unrecoverable) error to be raised:

(: number() never fails :)
xs:integer(number(@number-att))
(: more cleanly, gives you more control :)
if (@number-att castable as xs:integer) then xs:integer(@number-att) else 0

19.

Types and variables

David Carlisle


> Hi all, if I have declared the following variable:
>
> <xsl:variable name="test" as="element()">
>         <one>
>             <two>hello</two>
>         </one>
> </xsl:variable>
>
> I don't understand why <xsl:sequence select="$test/two"/> retrieves the
> value of <two/> while <xsl:sequence select="$test/one/two"/> does not. 
> If I omit the 'as' attribute it works the other way round. What exactly is
> happening when I assign a node() or element() or whatever type to this
> variable?
>
> Also could you advise what type I should be using for this kind of task?

if you use an as attribute the variable is bound to the sequence constructed, so in your case $test is an element node with name one (this is an element node with no parent, something that can not exist in xslt1)

so $test is element one and $test/two selects its child element with name 2.

If you do not use an as attribute and use content rather than a select attribute the xsl:variable works as in xslt1 and always generates a single document node / and any generated content is made a child of that node (by copying).

so in the second case $test is / $test/one is its child and $test/one/two is its child.

> Also could you advise what type I should be using for this kind of
> task?

it doesn't make much difference in your case with a single constructed element (except it changes the way you access it, as you found) but consider

<xsl:variable name="test" as="element()*">
 <a/>
 <b/>
</xsl:variable>

That's a sequence of two parentless elements, so having no parents they are not siblings so $test/self::a/following-sibling::b is empty

<xsl:variable name="test">
 <a/>
 <b/>
</xsl:variable>

is a / node with a and b children so $test/a/following-sibling::b is the b node.

so, if you think you might want to wander around via axis paths parentless nodes can be confusing, but there is sometimes a big, big win for using as="element()*

if you have

<xsl:variable name="test" as="element()*">
 <xsl:sequence select="foo/bar"/>
</xsl:variable>

then its like

<xsl:variable name="test"  select="foo/bar"/>

and selects all the foo/bar elements but selects the existing nodes so selecting

$test/foo[1]/bar[1]/../../../x/y

may well work and seelct something in the original tree

<xsl:variable name="test"">
 <xsl:sequence select="foo/bar"/>
</xsl:variable>

on the other hand generates a new / node and creates children of this node by _copying_ the nodes

so now

$test/foo[1]/bar[1]/../../../x/y

will definitely be empty as going up teo from teh bar elements will get you to the / at the top of this element.

obviously you don't want to copy whole document trees when you don't need to, but often the system won't really copy it anyway (if i understand MK correctly) but using as= makes ypu less reliant on the optimiser spotting that it can reuse nodes without actually copying them.

20.

Variables, siblings or orphans

Michael Kay

<xsl:variable name="v" as="element()+">
  <a/>
  <a/>
  <a/>
</xsl:variable>

The elements in the resulting sequence are not siblings. They are parentless, whereas siblings always share a parent. To make them siblings you need to add a document node, which you can do simply by leaving out the "as" attribute:

<xsl:variable name="v">
  <a/>
  <a/>
  <a/>
</xsl:variable>