xslt sort unique

Sorting

1. How to sort the unique elements
2. Sort on more than one element
3. Sorting
4. How to find out if a node is the first of its kind after a sort
5. Sort Order vs Document Order
6. First occurence of many
7. How to select only the unique elements from an xml document
8. How to sort by attribute
9. sort count number and group
10. Sorting text-number strings
11. Checking sort order
12. The ultimate unique sort
13. Removing duplicates Muenchian solution
14. sorting and counting
15. arbitrary sorting
16. Arbitrary sorting
17. Sorting into a variable
18. Ultimate arbitrary sort algorithm
19. Maximum value of a list
20. Case insensitive sorting
21. Case Insensitive Sorted Index with Headings
22. Sorting time values
23. Topological Sort
24. Special Alpha sort
25. Ordering and iteration problem
26. sorting code available
27. dynamically set the sort order
28. Can xsl:sort use a variable
29. Sorting.
30. Sort, upper case then lower case
31. Sorting on near numeric data
32. Sort by date
33. Sorting problems
34. How do I sort Hiragana and Katakana Japanese characters?
35. Sorting problems.
36. Sorting problems with whitespace (wrong order ).
37. Sorting Upper-Case first, or in *your* way
38. Topological sort
39. Sorting on names with and without spaces

1.

How to sort the unique elements

Oliver Becker

From

 <A>
  <D>
   <C/>
   <A>
    <B/>
   </A>
  </D>
  <B/>
 </A>

I want an output of

 ABCD

Applying the muenchian technique I get the following stylesheet:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:key name="first-id" match="*"
        use="generate-id((preceding::* | ancestor::*)
  [name() = name(current())])"/>

<xsl:template match="/">
  <xsl:apply-templates select="key('first-id', '')">
  <xsl:sort select="name()"/>
  </xsl:apply-templates>
</xsl:template>

<xsl:template match="*">
  <xsl:value-of select="name()"/>
</xsl:template>

</xsl:stylesheet>

2.

Sort on more than one element

Ben Robb


Put as many <xsl:sort> elements as you need:

<xsl:sort select="col1"/>
<xsl:sort select="col2"/>
...
	

where the first <xsl:sort> element specifies the primary sort key, the second specifies the secondary sort key and so on. When the apply-templates or for-each builds its nodelist, it sorts according to the sort keys; if two or more nodes have equal weight in the sort, then it should return in document order.

Steve Muench adds:

List each sort key in it's own <xsl:sort> element. The first one that appears in document order is the "primary" sort, the second one that appears is the "secondary" sort, etc.

<xsl:for-each select="customer-list/customer">
  <!-- Sort (alphabetically) on customer @name attr -->  
  <xsl:sort select="@name"/>
  <!-- Sort (numerically, descending) on sum of their orders -->
  <xsl:sort select="sum(orders/order/total)" 
         data-type="number" order="descending"/>

  <!-- etc. -->

</xsl:for-each>

Jeni T adds:

The advantage of this syntax over a comma-separated list is that you can have different properties attached to the two sorts, such as the order in which the list is sorted by these cols, or whether the cols are treated as text or numbers:

  <xsl:sort select="col1" order="ascending" data-type="text" />
  <xsl:sort select="col2" order="descending" data-type="number" />

You can add as many xsl:sorts as you want within an xsl:for-each or an xsl:apply-templates.

3.

Sorting

Mike Kay

Q: Expansion.

> I'm a bit confused by the interaction of xsl:sort and the various
> axes.  I suppose basically my question is: does xsl:sort affect the
> ordering of nodes for the purpose of reference within the stylesheet,
> or just for the purpose of the output?

xsl:sort affects the order in which the nodes are processed. It does not affect the position of the nodes on any axis, such as the following-siblings axis.

4.

How to find out if a node is the first of its kind after a sort

David Carlisle


<xsl:template match="test">
<xsl:for-each select="a">
<xsl:sort select="."/>
[<xsl:number value="position()"/>: 
	<xsl:value-of select="."/>]
<xsl:if test="position()=1">This is First</xsl:if>
</xsl:for-each>
</xsl:template>

produces

[1: a]
This is First
[2: e]

[3: f]

[4: g]

[5: x]

[6: z]


from 

<test>
<a>e</a>
<a>x</a>
<a>f</a>
<a>a</a>
<a>g</a>
<a>z</a>
</test>

            

5.

Sort Order vs Document Order

David Carlisle

Even when a node list is in sorted order (so the values returned by position() reflect sorted order) the axis specifiers like preceding-sibling refer to _document_ order.

6.

First occurence of many

Olivier Corby

Q expansion:

I'm trying to select all <term> elements in a document (multiple of which may have the same content), for which the element is the first containing its content.

You can try this :

<xsl:for-each select="term[not(preceding::term=.)]">
<xsl:value-of select="."/>
</xsl:for-each>

Mike Brown adds:

Hopefully this will work. The questionner wanted a comparison of "content" of term elements, which is more difficult to test than string values because descendant nodes would be considered "content". If each <term> contains only text, it will be fine.

7.

How to select only the unique elements from an xml document

Michael Kay


<xsl:for-each select="//CUSTOMER[not(.=preceding::CUSTOMER)]">
<xsl:value-of select="."/>
</xsl:for-each>


            

8.

How to sort by attribute

Steve Muench



 for source file
 
 <sender> 
 	<a clli="200" b="20"/> 
 	<a clli="100" b="10"/> 
 </sender> 
 
 Output needed is
 <wrapper>
   <a> @clli</a>
   <b> @b </b>
 </wrapper>

 
 
 <wrapper 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xsl:version="1.0">
   <xsl:for-each select="sender/a">
     <xsl:sort select="@clli"/>
     <a><xsl:value-of select="@clli"/></a>
     <b><xsl:value-of select="@b"/></b>
   </xsl:for-each>
 </wrapper>
   
 This produces:
 
 <wrapper>
<a>100</a>
<b>10</b>
<a>200</a>
<b>20</b>
</wrapper>
 

9.

sort count number and group

quagly

After perusing the FAQ I can sort, count, number, and group. But I cannot do them all at once. Please help.

Example:

xml:

<root>
    <foo>
          <bar>bard</bar>
          <bar>bark</bar>
    </foo>
    <foo>
          <bar>bark</bar>
          <bar>barb</bar>
     </foo>
</root>

Sample xsl that selects distinct <bar>

<xsl:template match="//bar[not(. = following::bar)]">
     <xsl:value-of select="."/>
</xsl:template>

produces:

bard bark barb

what I want is to number these, sort them, and count the number of times they appear in the xml source

Desired output:

1.  barb  -1
2.  bard  -1
3.  bark  -2

I can't seem to get there from here. Do I need to use for-each?

Solution:

 <xsl:template match="/">
 <DL>
 <xsl:apply-templates select="//bar[not(. = preceding::bar)]">
 <xsl:sort select="bar"/>
 </xsl:apply-templates>
 </DL>
</xsl:template>


<xsl:template match="bar">
 <DT>
 <xsl:number value="position()" format="1."/>
 <xsl:value-of select="."/>-
 <xsl:value-of select="count(//bar[.=current()])"/>
 </DT>
</xsl:template>
</xsl:stylesheet>

10.

Sorting text-number strings

David Carlisle

> Is there a way to achieve a sort with results?:
> title 1, title2, title 3,..., title 9, title 10, title 11

XSL isn't especially good (actually it's normally hopeless) at infering structure from character data, so it would have had a much easier time if the characters and numbers had been separated in the input

<title
name="this string"
number="42"/> 
or some such.

I'll give an example that sorts strings or the form "abc 123" ie characters, space, numbers, first on the first word, then numerically on the digits.

<xsl:for-each select="whatever">
<xsl:sort data-type="text" select="substring-before(.,' ')"/>
<xsl:sort data-type="number" select="substring-after(.,' ')"/>

Mike Kay adds: Saxon allows you to supply a user-defined collator implemented as a Java class.

11.

Checking sort order

Mike Kay

The simplest way to check that a list of strings is in sorted order is to sort it and see if the output equals the input. It's probably possible to improve the following:

<xsl:template name="is-sorted">
   <!-- test whether the document-order of the supplied $nodes
        is the same as the sorted order of their string-values -->
   <xsl:param name="nodes"/>
   <xsl:variable name="unsorted-nodes">
      <xsl:for-each select="$nodes"/>
          <xsl:value-of select="."/>
      </xsl:for-each>
   </xsl:variable>
   <xsl:variable name="sorted-nodes">
      <xsl:for-each select="$nodes"/>
          <xsl:sort/>
          <xsl:value-of select="."/>
      </xsl:for-each>
   </xsl:variable>
   <xsl:if test="string($sorted-nodes) != string($unsorted-nodes)">
      <xsl:message terminate="yes">Data is not correctly
sorted</xsl:message>
   </xsl:if>
</xsl:template>

E.g. to check that all qna's are sorted by topic order.

<xsl:template match="section">
  <xsl:call-template name="is-sorted">
    <xsl:with-param name="nodes" select="qna/topic"/>
  </xsl:call-template>

This passes all topics in this section to the named templated, which will bomb out with the message if the two 'orders' are not equal.

I'm curious why you cast them to string prior to the comparison? Is it not possible to compare a result tree fragment held in the two variables?

The cast to a string was there mainly for clarity, and also for robustness: The current version of MSXML doesn't follow the rules correctly when casting from a result-tree-fragment, though I think this example would be OK

12.

The ultimate unique sort

Steve Muench via Mike Kay, tweaked DaveP

<?xml version='1.0'?>

<Tasks>
   <Task><Desc>Task1</Desc><Owner>Steve</Owner></Task>
   <Task><Desc>Task2</Desc><Owner>Mike</Owner></Task>
   <Task><Desc>Task3</Desc><Owner>Dave</Owner></Task>
   <Task><Desc>Task4</Desc><Owner>Steve</Owner></Task>
   <Task><Desc>Task5</Desc><Owner>Mike</Owner></Task>
   <Task><Desc>Task9</Desc><Owner>Mike</Owner></Task>
   <Task><Desc>Task9</Desc><Owner>Fred</Owner></Task>
   <Task><Desc>Task9</Desc><Owner>Joe</Owner></Task>

</Tasks>

    
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
  <xsl:output indent="yes"/>
  <!--Create a key for the unique element required
      with the given example try Desc or Owner
      since both are duplicated.
  -->

    <xsl:key name="xxx" match="/Tasks/Task/Desc" use="."/>


  <xsl:template match="/">




  <Outer-Wrapper>
    <xsl:for-each
  select="/Tasks/Task/Desc[generate-id(.)=generate-id(key('xxx',.)[1])]">
<!--Selects unique items -->
      <xsl:sort select="."/> <!--Only if wanted 'sorted' -->
      <Unique-Item-List Element-Name="{.}">   
                          <!--Optional inner wrapper -->
	<xsl:for-each select="key('xxx',.)/..">
                          <!-- Unique items -->

  <xsl:comment>Present any associated data. Context is Element-name
</xsl:comment>

        </xsl:for-each>
      </Unique-Item-List> <!--Close inner wrapper -->
    </xsl:for-each>
  </Outer-Wrapper>
  </xsl:template>
</xsl:stylesheet>
    

Ken Holman adds another example.

T:\ftemp>type tests.xml
<?xml version="1.0"?>
<names>
<name><given>Julie</given><surname>Holman</surname></name>
<name><given>Margaret</given><surname>Mahoney</surname></name>
<name><given>Ted</given><surname>Holman</surname></name>
<name><given>John</given><surname>Mahoney</surname></name>
<name><given>Kathryn</given><surname>Holman</surname></name>
<name><given>Ken</given><surname>Holman</surname></name>
</names>
T:\ftemp>type tests.xsl
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                 version="1.0">

<xsl:output method="text"/>

             <!--prepare to examine all names valued by surname-->
<xsl:key name="surnames" match="name" use="surname"/>

<xsl:template match="/">                         <!--root rule-->
             <!--select only those name elements whose unique
                 generated id is equal to the generated id of the
                 first of the key members with the same surname-->
   <xsl:for-each
         select="//name[generate-id(.)=
                        generate-id(key('surnames',surname)[1])]">
     <xsl:value-of select="surname"/>     <!--show the grouping-->
     <xsl:text>
</xsl:text>
                         <!--select only those for the grouping-->
     <xsl:for-each select="//name[surname=current()/surname]">
       <xsl:sort select="given"/>    <!--sorted within grouping-->
       <xsl:text>   </xsl:text>
       <xsl:value-of select="given"/>   <!--member distinctions-->
       <xsl:text>
</xsl:text>
     </xsl:for-each>
   </xsl:for-each>
</xsl:template>

</xsl:stylesheet>
    

T:\ftemp>saxon tests.xml tests.xsl
Holman
    Julie
    Kathryn
    Ken
    Ted
Mahoney
    John
    Margaret
    

And Sebastian Rahtz adds you can speed this up again, I believe, by using the key again

     <xsl:for-each
           select="//name[generate-id(.)=
                         generate-id(key('surnames',surname)[1])]">
     <xsl:variable name="surname" select="surname"/>
     <xsl:for-each select="key('surnames',$surname)">

 ...
    

That is to say, for every unique surname, consult the key again to get the list of people with that surname. that way, you do not have to navigate the tree again at all, since the list of relevant nodes is already known.

13.

Removing duplicates Muenchian solution

Jeni Tennison

I'm going to assume you *were* actually referring to removing duplicate elements and, to make the answer more general and more accurate, I'm also going to assume that you have a number of different elements within your content. Finally, I'm going to assume that you do know that the thing that it is the content of the element that makes it a duplicate (rather than the value of an attribute, say), so something like:

	<doc>
	<employee>Bill</employee>
	<employee>Andy</employee>
       <director>Amy</director>
	<employee>Bill</employee>
       <director>Louise</director>
       <director>Louise</director>
	<employee>Bill</employee>
	<employee>Andy</employee>
	<employee>John</employee>
       <director>Amy</director>
       <director>Louise</director>
	</doc>

To produce something like:

	<doc>
	<employee>Bill</employee>
	<employee>Andy</employee>
       <director>Amy</director>
       <director>Louise</director>
	<employee>John</employee>
	</doc>

Rather than using the preceding-sibling axis, I'm going to use the Muenchian technique to identify the first unique elements, because it's a lot easier to use in this case, as well as being more efficient generally.

First, define a key so that you can index on the unique features of the particular elements that you want. In this case, there are two unique features: the name of the element, and the content of the element. To make a key that includes both, I'm concatenating these two bits of information together (with a separator to hopefully account for odd occurrances that could generate the same key despite having different element/content combinations):

<xsl:key name="elements" match="*" use="concat(name(), '::', .)" />

So all the <employee>Bill</employee> elements are indexed under 'employee::Bill'. The unique elements are those that appear first in the list of elements that are indexed by the same key. Identifying those involves testing to see whether the node you're currently looking at is the same node as the first node in the list that is indexed by the key for the node. So if the <employee>Bill</employee> node that we're looking at is the first one in the list that we get when we retrieve the 'employee::Bill' nodes from the 'elements' key, then we know it hasn't been processed before.

<xsl:template match="doc">
  <xsl:for-each select="*[generate-id(.) =
      generate-id(key('elements', concat(name(), '::', .))[1])]">
    <xsl:copy-of select="." />
  </xsl:for-each>
</xsl:template>

14.

sorting and counting

Jeni Tennison


>1. sort them by 'priority'
>2. leave, say, only 3 nodes in the result

Here's a solution. First, specify the number of nodes you want in a parameter, so that you can change it whenever you like: <xsl:param name="nodes" select="'3'" />

Next, you want to treat the nodes individually despite them being nested inside each other, and you want to sort them within your output in order of priority. You can use either xsl:for-each or xsl:apply-templates to select the nodes within the document, whatever their level (using //node) and xsl:sort within whichever you use to sort in order of priority. For example:

  <xsl:for-each select="//node">
    <xsl:sort select="@priority" order="ascending" />
    ...
  </xsl:for-each>

Within that, you only want to output anything if the position of the node within that sorted list is less than or equal to the number of nodes you want in the result. In other words:

  <xsl:for-each select="//node">
    <xsl:sort select="@priority" order="ascending" />
    <xsl:if test="position() <= number($nodes)">
      <xsl:value-of select="name" />
    </xsl:if>
  </xsl:for-each>

15.

arbitrary sorting

Oliver Becker

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:m ="urn:non-null-namespace">

<m:month name="Jan" value="1" />
<m:month name="Feb" value="2" />
<m:month name="Mar" value="3" />
<m:month name="Apr" value="4" />
<m:month name="May" value="5" />
<m:month name="Jun" value="6" />
<m:month name="Jul" value="7" />
<m:month name="Aug" value="8" />
<m:month name="Sep" value="9" />
<m:month name="Oct" value="10" />
<m:month name="Nov" value="11" />
<m:month name="Dec" value="12" />

<xsl:template match="report-list">
   <xsl:apply-templates>
      <xsl:sort select="document('')//m:month[@name=current()/@month]/@value" 
                data-type="number" />
   </xsl:apply-templates>
</xsl:template>

</xsl:stylesheet>

To sort descending, xsl:sort has an 'order' attribute, with possible values 'ascending' (the default) or 'descending'.

16.

Arbitrary sorting

David Marston


>supposing I have elements with a month attribute
  <report month="Jan" />
  <report month="Feb" />
>and so on.
>Of course unordered :-)
>Now I want them in chronological order....
>I know how to translate Jan->1, Feb->2 etc via a named template
>[and] xsl:choose, but that doesn't help much in this case.
    

Naturally, what you want is to map names to numbers using keys, which can be very efficient. Keys were made for just this purpose! So far, I've been able to get this to work if the month table is in the input document.

Consider this input document:
<?xml version="1.0"?>
<doc>
<monthtab>
  <entry><name>Jan</name><number>1</number></entry>
  <entry><name>January</name><number>1</number></entry>
  <entry><name>Feb</name><number>2</number></entry>
  <entry><name>February</name><number>2</number></entry>
  <entry><name>Mar</name><number>3</number></entry>
  <entry><name>March</name><number>3</number></entry>
  <entry><name>Apr</name><number>4</number></entry>
  <entry><name>April</name><number>4</number></entry>
  <entry><name>May</name><number>5</number></entry>
  <entry><name>Jun</name><number>6</number></entry>
  <entry><name>June</name><number>6</number></entry>
  <entry><name>Jul</name><number>7</number></entry>
  <entry><name>July</name><number>7</number></entry>
  <entry><name>Aug</name><number>8</number></entry>
  <entry><name>August</name><number>8</number></entry>
  <entry><name>Sep</name><number>9</number></entry>
  <entry><name>Sept</name><number>9</number></entry>
  <entry><name>September</name><number>9</number></entry>
  <entry><name>Oct</name><number>10</number></entry>
  <entry><name>October</name><number>10</number></entry>
  <entry><name>Nov</name><number>11</number></entry>
  <entry><name>November</name><number>11</number></entry>
  <entry><name>Dec</name><number>12</number></entry>
  <entry><name>December</name><number>12</number></entry>
</monthtab>
<bday person="Linda"><month>Apr</month><day>22</day></bday>
<bday person="Marie"><month>September</month><day>9</day></bday>
<bday person="Lisa"><month>March</month><day>31</day></bday>
<bday person="Harry"><month>Sep</month><day>16</day></bday>
<bday person="Ginny"><month>Jan</month><day>22</day></bday>
<bday person="Pedro"><month>November</month><day>2</day></bday>
<bday person="Bill"><month>Apr</month><day>4</day></bday>
<bday person="Frida"><month>July</month><day>5</day></bday>
</doc>

The first part of the above document is the month table. For demonstration purposes, I have both abbreviated and full month names (look at September) as synonyms, and you could easily add names in other languages. There's a many-to-one structure: look up a name, get back the correct month number. The rest of the document is the set of records that we want to sort in chronological order. The <day> elements will work as a simple numeric sort, but that's secondary to the sort by months. Following Oliver's request, we want <xsl:sort select="key('MonthNum',month)" data-type="number"/> where we will take the <month> as a string and get its number out of the 'MonthNum' keyspace.

I'll supply an example of the above sort working in an apply-templates situation, but it can work similarly in a for-each loop. The current node will be the outer <doc> element at the time we sort, so the keyspace definition will also be based on that context:

<xsl:key name="MonthNum"
   match="monthtab/entry/number" use="../name" />

With that background, check out this stylesheet:

<?xml version="1.0"?>
<xsl:stylesheet
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  version="1.0">

<xsl:key name="MonthNum" match="monthtab/entry/number"
  use="../name" />

<xsl:template match="doc">
  <out>
    <xsl:text>Birthdays in chronological order...
</xsl:text>
    <xsl:apply-templates select="bday">
      <xsl:sort select="key('MonthNum',month)"
        data-type="number" />
      <xsl:sort select="day" data-type="number" />
    </xsl:apply-templates>
  </out>
</xsl:template>

<xsl:template match="bday">
  <xsl:value-of select="@person"/><xsl:text>: </xsl:text>
  <xsl:value-of select="month"/><xsl:text> </xsl:text>
  <xsl:value-of select="day"/><xsl:text>
</xsl:text>
</xsl:template>

</xsl:stylesheet>

The <out> element is just something we commonly use as a tracing aid. The above works on both Xalan and Saxon.

Ideally, one would want to put the month table in a completely separate file, so it could be shared among all stylesheets that needed it. Depending on your situation, you might prefer to have the month table right in the stylesheet. Either way, you have to use document(), which certainly complicates the procedure. The point of this message is to show that key() can be used in sort keys.

17.

Sorting into a variable

Jeni Tennison

Whats wrong with this?

  <xsl:variable name="fns">
    <xsl:for-each select="functions[.!='']">
      <xsl:sort data-type="text" select="pair/word"/>
    </xsl:for-each>
  </xsl:variable>

The xsl:for-each loops over each of the functions, sorted in terms of pair/word and then... does nothing with them! Since there's nothing there that actually produces any output, then the variable $fns is set to an empty rtf. When it's passed to the next for-each, there's nothing to iterate over.

What you wanted was:

  <xsl:variable name="fns">
    <xsl:for-each select="functions[.!='']">
      <xsl:sort data-type="text" select="pair/word"/>
      <xsl:copy-of select="." />
    </xsl:for-each>
  </xsl:variable>

i.e. to produce a copy of each of the sorted functions to process later on.

[I guess there's a good reason for storing the sorted list and using an extension function to access it rather than just doing:

  <xsl:for-each select="functions[.!='']">
    <xsl:sort data-type="text" select="pair/word" />*
    <xsl:value-of select="child::*/text()"/> <br />
  </xsl:for-each>]

18.

Ultimate arbitrary sort algorithm

Oliver Becker /Jeni T

Mike Kay answered something like "it's not possible to construct a key expression that requires conditional processing".

As we've learned from the intersection expression: never say never! :-)

In the following I try to explain a method to construct such an expression - only by XPath means. Decide yourself wether the result is rather pretty or rather ugly ...

xsl:sort requires an expression used as the sort key. What we want is the following:

   if condition1 then use string1 as sort key
   if condition2 then use string2 as sort key
   etc.

How to achieve that? The following expression gives $string if $condition is true, otherwise it results in an empty string: substring($string, 1 div number($condition)) Regarding to Mike's book this is perfectly valid. (Note: works with Saxon and XT, but not with my versions of Xalan and Oracle XSL - but I've not installed the latest versions ...)

If you don't like "infinity" - here's another solution: substring($string, 1, number($condition)*string-length($string)) but then you need $string twice ...

The concatenation of all these substring expressions forms the sort key. Requirement: all conditions must be mutually exclusive.

That's all! :-)

Here's my example which demonstrates the handling of leading "Re: "s. If the string starts with "Re: ", an equivalent string without this prefix but with an appended ", Re" forms the key, otherwise the original string is used:

<xsl:sort select="concat(
   substring(concat(substring-after(.,'Re: '), ', Re'), 
             1 div number(starts-with(.,'Re: '))),
   substring(., 1 div number(not(starts-with(.,'Re: ')))))" />

As you may imagine these expressions could become very complex the more arbitrary you want to sort.

Jeni Tennison adds

This is obviously the 'Becker Method' :)

It is, of course, hideous when you actually use it. You can make it a little less hideous by dropping the number() - 'div' automatically converts its arguments to a number anyway. Do make sure, as well, to use boolean() if the condition is a node set to convert it into a boolean value (true if such a node exists, false if it doesn't). In other words, the pattern is:

  concat(substring($result1,
                   1 div $condition1),
         substring($result2,
                   1 div $condition2),
         ...)
where the conditions are all boolean and the results are all strings.
Matt's original problem was:
>I have some data:
><item>MacBean</item>
><item>McBarlow</item>
><item>Re MacBart</item>
><item>Re McBeanie</item>
>
>Which needs to be sorted and transformed as follows:
><item>McBarlow</item>
><item>Re McBart</item>
><item>MacBean</item>
><item>Re McBeanie</item>


This is solved by:


<xsl:template match="list">
  <xsl:for-each select="item">
    <xsl:sort
      select="concat(
               substring(concat('Mac', substring-after(., 'Re Mc'), ', Re'),
                           1 div starts-with(., 'Re Mc')),
                 substring(concat(substring-after(., 'Re '), ', Re'),
                           1 div (starts-with(., 'Re ') and
                                not(starts-with(., 'Re Mc')))),
                 substring(concat('Mac', substring-after(., 'Mc')),
                           1 div (not(starts-with(., 'Re ')) and
                                starts-with(., 'Mc'))),
                 substring(.,
                           1 div not(starts-with(.,'Mc')
                                   or starts-with(., 'Re '))))" />
                <xsl:copy-of select="." />
        </xsl:for-each>
</xsl:template>

Assuming that 'Re ' is the only thing that can precede the name. You can nest the conditions if you want to (actually this makes it even more complex!).

This is some of the ugliest XSLT I have ever seen :) :)

19.

Maximum value of a list

Jeni Tennison

If the list is declared in XML, you can sort the list of values in descending order and pick off the first value:

<xsl:variable name="maximum">
  <xsl:for-each select="$list">
    <xsl:sort select="." order="descending" />
    <xsl:if test="position() = 1">
      <xsl:value-of select="." />
    </xsl:if>
  </xsl:for-each>
</xsl:variable>

If the list were a string separated by commas, say, then you have to use recursion, and the current node doesn't matter, so named templates are the best choice, but you can use xsl:apply-templates instead if you want to:

<xsl:variable name="maximum">
  <xsl:apply-templates select="." mode="maximum">
    <xsl:with-param name="list" select="concat($list, ', ')" />
  </xsl:apply-templates>
</xsl:variable>


<xsl:template match="node()|/" mode="maximum">
  <xsl:param name="list" />
  <xsl:variable name="first" select="substring-before($list, ',')" />
  <xsl:variable name="rest" select="substring-after($list, ',')" />
  <xsl:choose>
    <xsl:when test="not(normalize-space($rest))">
      <xsl:value-of select="$first" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:variable name="max">
        <xsl:apply-templates select="." mode="maximum">
          <xsl:with-param name="list" select="$rest" />
        </xsl:apply-templates>
      </xsl:variable>
      <xsl:choose>
        <xsl:when test="$first > $max">
          <xsl:value-of select="$first" />
        </xsl:when>
        <xsl:otherwise>
          <xsl:value-of select="$max" />
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

20.

Case insensitive sorting

Jeni Tennison

In the comparison that does not involve case-insensitive translation, you select: //LEAGUE[not(@NAME = preceding::*/@NAME)]

Within equality tests, the way the result is worked out depends on the type of the nodes that are involved. When they involve node sets (as in this case), the equality expression returns true if there are nodes within the node set(s) for which the equality expression will be true. In other words, "@NAME = preceding::*/@NAME" returns true if *any* of the preceding elements has a NAME attribute that matches the NAME attribute of the current node.

When you select with the translation: //LEAGUE[not(translate(@NAME,$lower,$upper) = translate(preceding::*/@NAME,$lower,$upper))] things work differently because the translate() function returns a string. Doing: translate(preceding::*/@NAME, $lower, $upper) translates the string value of the node set preceding::*/@NAME from lower case to upper case, and returns this string. The string value of a node set is the string value of the first node in the node set - the value of the first preceding element's NAME attribute. That means that you're testing the equality of the translated @NAME of the current element with the translated @NAME of the first preceding element, not comparing it with all the other preceding element's @NAME

I don't *think* it's possible to do the selection you're after with a single select expression, but you could get around it by doing a xsl:for-each on all the //LEAGUE elements, and containing within it an xsl:if that only retrieved those who don't have a preceding element with the same (translated) name:

<xsl:for-each select="//LEAGUE">
  <xsl:sort select="@NAME" />
  <xsl:variable name="name" select="translate(@NAME, $lower, $upper)" />
  <xsl:if 
   test="not(preceding::*[translate(@NAME, $lower, $upper) = $name])">
    <!-- do stuff -->
  </xsl:if>
</xsl:for-each>

If possible, for efficiency, you should probably give a more exact indication of the LEAGUE elements you're interested in (e.g. '/SCORES/LEAGUE') and if you're only interested in the preceding-sibling::LEAGUE elements, you should use this rather than the general preceding::*. Generally, more specific XPath expressions are more efficient.

David Carlisle generalises case insenstive comparisons

If you are writing in English,

<xsl:variable name="u" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="l" select="'abcdefghijklmnopqrstuvwxyz'"/>

<xsl:when test= "self::node()
             [translate(. ,$u,$l) = 
              translate($reference,$u,$l)]">

21.

Case Insensitive Sorted Index with Headings

Eric Taylor

Case Insensitive Sorted Index with Headings (including combined heading for all numbers/symbols)

PROBLEM: Need sorted index with a single subheading for numbers and symbols and subheadings for each letter represented. Important that upper and lowercase letters be intermingled, that is:

  <H2>A</H2> 
    aa 
    Ab 
    Ac 
    ad 

rather than 

  <H2>a</H2> 
    aa 
    ad 

  <H2>A</H2> 
    Ab 
    Ac 

My XML looked like this:

  <?xml version="1.0"?> 
  <?xml:stylesheet type="text/xsl" href="test.xsl"?> 

  <pages> 
    <page name="name1" location="file1.xml">
      <index entry="aa"/>
    </page> 
    <page name="name2" location="file2.xml">
      <index entry="Ac"/>
      <index entry="Ab"/>
      <index entry="ad"/>
    </page> 
    ... 
  </pages> 

The solution:

Use sort with key to group items based on first letter.
Use transform when creating the key to convert lowercase to uppercase letters (e.g., so that 'a' and 'A' would be grouped together), and to convert all numbers/relevant symbols to a single symbol (so 1, 2, 3, etc. would all be in the same group.)
For each group, create a heading.
For each group, sort the entries in the group, again using transform to convert lower to uppercase (e.g., so 'aa' would be sorted as if it were 'Aa', thus performing a case insensitive sort).

This resulted in XSL that looked like this:

  <?xml version="1.0"?> 
  <xsl:stylesheet version="1.0" 
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
     xmlns:HTML="http://www.w3.org/Profiles/XHTML-transitional">

  <xsl:key name="letters" match="//index" 
     use="translate (substring(@entry,1,1),
     'abcdefghijklmnopqrstuvwxyz1234567890@',
     'ABCDEFGHIJKLMNOPQRSTUVWXYZ###########')" />

   ... 

  <xsl:template match="pages"> 
    <xsl:for-each select="//index[count(. | key('letters', 
     translate (substring(@entry,1,1),
     'abcdefghijklmnopqrstuvwxyz1234567890@',
     'ABCDEFGHIJKLMNOPQRSTUVWXYZ###########'))[1]) = 1]">

      <xsl:sort select="@entry" /> 
      <xsl:variable name="initial" 
     select="translate (substring(@entry,1,1),
     'abcdefghijklmnopqrstuvwxyz1234567890@',
     'ABCDEFGHIJKLMNOPQRSTUVWXYZ###########')" />

      <a name="{$initial}" /> 
      <xsl:choose> 
        <xsl:when test ="$initial = '#'"> 
          <h2>Numbers &amp; symbols</h2> 
        </xsl:when> 
        <xsl:otherwise> 
          <h2><xsl:value-of select="$initial" /></h2> 
        </xsl:otherwise> 
      </xsl:choose> 
      <xsl:for-each select="key('letters', $initial)"> 
        <xsl:sort select="@entry" /> 
        <p><a><xsl:attribute 
     name="href"><xsl:value-of 
     select="../@location"/></xsl:attribute><xsl:value-of 
     select="@entry"/></a></p>

      </xsl:for-each> 
    </xsl:for-each> 
  </xsl:template> 

22.

Sorting time values

Mike Kay




> I have the following xml:
> 
> <times>
> 	<time value="10:45"/>
> 	<time value="1:15"/>
> 	<time value="9:43"/>
> 	<time value="35:27"/>
> 	<time value="20:48"/>
> </times>
	

Break up the time using substring-before() and substring-after(), and use the two parts as major and minor sort key, both with data-type="number".

Dimitre Novatchev provides the example


<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes"/>
  <xsl:template match="/times">
    <xsl:copy>
      <xsl:apply-templates  select="time">
        <xsl:sort data-type="number"
select="substring-before(@value,':')"/>
	<xsl:sort data-type="number" select="substring-after(@value,':')"/>
      </xsl:apply-templates>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="/ | @* | node()">
    <xsl:copy>
      <xsl:apply-templates  select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

23.

Topological Sort

Joerg Pietschmann

i want to implement a topological sort in XSLT. This is necessary for generating program code (IDL for example) out of an XML file

A stylesheet for processing elements in topological sorted order. The trick is to carefully select elements from the document into variables so that they are node sets and cen be selected later on.

The complete problem is stated in an earlier post (2000-11-04) and can be found in the archive.

Pseudocode:
  select structs with no dependencies
  process them
  repeat
    if not all structs are processed
      select structs which are
        not processed
        have only dependencies which are processed
      if empty
        stop
      else
        process them
  done

This translates into the following stylesheet:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="text"/>
  
  <xsl:template match="structs">
    <xsl:call-template name="process">
      <xsl:with-param name="nodes" select="struct[not(field/type/ref)]"/>
      <xsl:with-param name="finished" select="/.."/>
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="process">
    <xsl:param name="nodes"/>
    <xsl:param name="finished"/>
    <xsl:variable name="processed" select="$nodes|$finished"/>
    <xsl:for-each select="$nodes">
      <xsl:value-of select="name"/>
    </xsl:for-each>
    <xsl:if test="count(struct)>count($processed)">
      <xsl:variable name="nextnodes"
         select="struct[not($processed/name=name)
                 and not(field/type/ref[not(. = $processed/name)])]"/>
      <xsl:if test="$nextnodes">
        <xsl:call-template name="process">
          <xsl:with-param name="nodes" select="$nextnodes"/>
          <xsl:with-param name="finished" select="$processed"/>
        </xsl:call-template>
      </xsl:if>
    </xsl:if>
  </xsl:template>

</xsl:stylesheet>

The structs are processed in increasing distance from leaves in the dependency graph. Can one of the gurus please comment on the "count(field/type/ref)=count(...)" construct, and whether this could be substituted by a possibly more efficient condition? In my real world examples with some 200+ structs it takes quite some time, if there are cheap optimisations i would appreciate it. Applying the stylesheet to the following example document gives the expected result of This outputs "ACBED" which is the correct dependency order.

<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE structs [

<!ELEMENT structs (struct*)>
<!ELEMENT struct (name,field*)>
<!ELEMENT field (name,type)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT type (ref|long)>
<!ELEMENT ref (#PCDATA)>
<!ELEMENT long EMPTY>
]>

<structs>
  <struct>
    <name>A</name>
    <field>
      <name>A1</name>
      <type><long/></type>
    </field>
  </struct>
  <struct>
    <name>B</name>
    <field>
      <name>B1</name>
      <type><ref>A</ref></type>
    </field>
    <field>
      <name>B2</name>
      <type><ref>C</ref></type>
    </field>
  </struct>
  <struct>
    <name>C</name>
    <field>
      <name>C1</name>
      <type><long/></type>
    </field>
  </struct>
  <struct>
    <name>D</name>
    <field>
      <name>D1</name>
      <type><ref>E</ref></type>
    </field>
    <field>
      <name>D2</name>
      <type><ref>A</ref></type>
    </field>
  </struct>
  <struct>
    <name>E</name>
    <field>
      <name>E1</name>
      <type><ref>C</ref></type>
    </field>
  </struct>
</structs>

24.

Special Alpha sort

Dimitre Novatchev, Jeni Tennison

I need to create an alphabetical list of words. If a word contains a dash, it means it has to be joined with the following word that has an attribute type="end".

xml
<?xml version='1.0'?>
<root>
    <line lineID="1">
      <word wordID="1">ABC-</word>
      <word wordID="2">ABCD</word>
      <word wordID="2">ABCDE</word>
    </line>
    <line lineID="2">
      <word wordID="1" type="end">DEF</word>
      <word wordID="2">XYZ</word>
      <word>ABC-</word>
      <word>spirit-level</word>
      <word type="end">DEF</word>
    </line>
  </root>
(which combines your test case)

and xsl


<?xml version="1.0" ?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:output method="text"/>


<xsl:template match="/">
  <xsl:for-each select="root/line/word[not(@type = 'end')]">
    <xsl:sort 
      select="translate(concat(.,self::*[contains(.,'-')]
              /following::word[@type='end']), '-', '')"/>
    <xsl:choose>
      <xsl:when test="substring(., string-length()) = '-'">
          <xsl:value-of select="substring(., 1, string-length() - 1)"/>
        <xsl:value-of select="following::word[@type='end']"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="."/>
      </xsl:otherwise>
    </xsl:choose>
    <xsl:if test="position() != last()">
      <xsl:text>&#xA;</xsl:text>
    </xsl:if>
  </xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Which includes your changes to test for the extra cases. (and your minor correction)

this gives

ABCD
ABCDE
ABCDEF
ABCDEF
spirit-level
XYZ

25.

Ordering and iteration problem

Jeni Tennison


> My thinking is that I need to do something like
>
>    for each row
>        for each column
>            ooutput the <circuit-breaker> with that row and column

I'd probably do this using the Piez Method/Hack of having an xsl:for-each iterate over the correct number of random nodes and using the position of the node to give the row/column number.

You need to define some random nodes - I usually use nodes from the stylesheet:

<xsl:variable name="random-nodes" select="document('')//node()" />

And since you'll be iterating over them, you need some way of getting back to the data:

<xsl:variable name="data" select="/" />

I've used two keys to get the relevant circuit breakers quickly. One just indexes them by column (this is so you can work out whether you need to add a cell or whether there's a circuit breaker from higher in the column that covers it). The other indexes them by row and column.

<xsl:key name="breakers-by-column" match="b:circuit-breaker"
         use="@column" />
<xsl:key name="breakers" match="b:circuit-breaker"
         use="concat(@row, ':', @column)" />

I've assumed that you've stored the maximum number of rows in a global variable called $max-row and the maximum number of columns in a global variable called $max-col. Here's the template that does the work:

<xsl:template match="/">
   <!-- store the right number of nodes for the rows in a variable -->
   <xsl:variable name="rows"
                 select="$random-nodes[position() &lt;= $max-row]" />
   <!-- store the right number of nodes for the columns in a variable
        -->
   <xsl:variable name="columns"
                 select="$random-nodes[position() &lt;= $max-col]" />
   <!-- create the table -->
   <table>
      <!-- iterate over the right set of nodes to get the rows -->
      <xsl:for-each select="$rows">
         <!-- store the row number -->
         <xsl:variable name="row" select="position()" />
         <!-- create the row -->
         <tr>
            <!-- iterate over the right set of nodes to get the
                 columns -->
            <xsl:for-each select="$columns">
               <!-- store the column number -->
               <xsl:variable name="col" select="position()" />
               <!-- change the current node so that the key works -->
               <xsl:for-each select="$data">
                  <!-- identify the relevant circuit breaker -->
                  <xsl:variable name="breaker"
                                select="key('breakers',
                                            concat($row, ':', $col))" />
                  <xsl:choose>
                     <!-- if there is one, apply templates to get the
                          table cell -->
                     <xsl:when test="$breaker">
                        <xsl:apply-templates select="$breaker" />
                     </xsl:when>
                     <xsl:otherwise>
                        <!-- find other breakers that start higher in
                             the column -->
                        <xsl:variable name="column-breakers"
                                select="key('breakers-by-column', $col)
                                                 [@row &lt; $row]" />
                        <!-- output an empty cell if there isn't one
                             that overlaps -->
                        <xsl:if test="not($column-breakers
                                     [@row + @height &gt; $row])">
                           <td />
                        </xsl:if>
                     </xsl:otherwise>
                  </xsl:choose>
               </xsl:for-each>
            </xsl:for-each>
         </tr>
      </xsl:for-each>
   </table>
</xsl:template>

<!-- template to give the cell for the circuit breaker -->
<xsl:template match="b:circuit-breaker">
   <td rowspan="{@height}">
      <xsl:value-of select="b:amps" />
   </td>
</xsl:template>

If you don't like the 'one big template' approach, then you could split it down by applying templates to the row and column nodes in 'row' and 'column' modes to distinguish between the two.

26.

sorting code available

Francis Norton

There is now a new version or a sort function at redrice.com which implements a mix-and-match architecture.

You can now import and call either the simplesort or mergesort template with exactly the same parameters including one which specifies which of your project-specific compare templates you want used.

Both sort templates return their result in the same way, as an ordered list of node-ids, eg

	"[1:cr423][2:cd342]..."

which can be de-referenced very conveniently within for-each loops or even XPath expressions.

The demo can be run from the command line:

C:\test>saxon sort.xml sortcall.xslt
<?xml version="1.0" encoding="UTF-8"?>
<product id="a_a_00_01">                2       </product>
<product id="a_a_00_03">                1       </product>
<product id="a_a_00_05">                4       </product>
<product id="a_a_00_9">                 w       </product>
<product id="a_a_00_9">                 x       </product>
<product id="a_a_00_b">                 y       </product>
<product id="a_a_00_b">                 z       </product>
<product id="a_b_00_02">                3       </product>
<product id="a_a_30_50">                5       </product>
<product id="a_a_60_20">                6       </product>
<product id="a_a_30_20">                7       </product>
<product id="a_a_100_30">               8       </product>

To switch sortcall.xslt from using mergesort to simplesort, change line 19 from

			<xsl:call-template name="mergesort">

to 

			<xsl:call-template name="simplesort">


27.

dynamically set the sort order

Dimitre Novatchev

> I am trying to dynamically set sort order and sort column in my 
> XSLT. It seems that I can not use an expression for "order".
>
> <xsl:apply-templates select="Uow">
>   <xsl:sort select="$sortColumn" order="$sortOrder"/>
>   <xsl:with-param name="from" select="$startRow"/>
>   <xsl:with-param name="to" select="$endRow"/>
> </xsl:apply-templates>

All attributes of xsl:sort with the exception of "select" can be specified as AVT-s.

The "select" attribute can be any XPath expression. Certainly, an attempt to put an XPath expression in a variable and pass this variable as the (complete) value of the "select: attribute -- this will fail for xsl: sort as in any such attempt in XSLT, because XPath expressions are not evaluated dynamically.

However, if you need to specify the name of a child element, then you can use an expression like this:

*[name()=$sortColumn]

Therefore, one possible way to achieve your wanted results is:

<xsl:sort order="{$order}" select="*[name()=$sortColumn]"/>

28.

Can xsl:sort use a variable

Mike Kay


>. Can the &lt;xsl:sort/> use a variable directly?  e.g. &lt;xsl:sort
> select="$orderBy">&lt;xsl:sort> 

The value of $orderBy doesn't depend on the current node, so you'll get the same sort key value for every node. You probably want select="*[name()=$orderBy]".

29.

Sorting.

Thomas B Passin



> Given the xml source:
>
> <?xml version="1.0" encoding="iso-8859-1"?>
> <root>
> <wrap>
> <joe>Apples</joe>
> </wrap>
> <wrap>
> <joe>Bananas</joe>
> </wrap>
> <wrap>
> <ann>Pears</ann>
> </wrap>
> <wrap>
> <joe>Oranges</joe>
> </wrap>
> </root>
>
> And the desired output:
>
> Joe says: "Apples, Bananas."
> Ann says: "Pears."
> Joe says: "Oranges."
>

I did a slightly different take on this and assumed that you would want to collect all of joe's preferences together, like this:

 Ann says: "Pears."
 Joe says: "Apples, Bananas,Oranges."

Even if this is not what you really want, it's interesting to see how it works out. I have not completely handled putting in commas everywhere except a period for the last item - I leave this to the reader. I also haven't translated the first character of the name to upper case I also sorted the result by name. The solution is very compact without those refinements:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method='text'/>

<!-- key for using the Muenchian method of getting unique node sets-->
<xsl:key name='wrappers' match='wrap' use='name(*)'/>

<!-- Elements with unique person names -->
<xsl:variable name='unique'
select='/root/wrap[generate-id(key("wrappers",name(*))[1])=generate-id(.)]'/>

<xsl:template match="/">
 <xsl:for-each select='$unique'>
  <xsl:sort select='name(*)'/>
  <xsl:variable name='theName' select='name(*)'/>
  <xsl:value-of select='$theName'
  /> says: <xsl:for-each select='key("wrappers",$theName)'
   ><xsl:value-of select='normalize-space(.)'
  />, </xsl:for-each><xsl:text>&#10;&#13;</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

The slightly odd formatting is an easy way to control the output format while still having short line lengths in the stylesheet (better for emailing), and the &#13; character reference is necessary on my (Windows) system to get the line feed to display. Here is the result (I added another person, bob, to the data, just for fun):

===========================
ann says: Pears,
bob says: Peaches,
joe says: Apples, Bananas, Oranges,

===========================

This was interesting because the usual examples for getting unique node-sets assume you know what the target elements are named, but not in this case.

30.

Sort, upper case then lower case

Andrew Welch



>Is there any way to cause the sort command to sort all the upper case =
>first and then the lower case? I don't mean using the upper-case or =
>lower-case settings, because that just determines which order they are =
>in. But I want the following results:

This should work, although to me it looks a bit long-winded.

So, to sort this xml by case (upperfirst) and then by value:

<root>
<node>apple</node>
<node>Orange</node>
<node>bannana</node>
<node>Pear</node>
<node>peach</node>
<node>Monkey</node>
</root>

Output required:

Monkey
Orange
Pear
apple
bannana
peach

The XSL:

<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'"/>
<xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="hashes" select="'##########################'"/>
<xsl:variable name="lowercasenodes"
select="root/node[starts-with(translate(.,$lowercase,$hashes),'#')]"/>
<xsl:variable name="uppercasenodes"
select="root/node[starts-with(translate(.,$uppercase,$hashes),'#')]"/>

<xsl:template match="/">
  <xsl:apply-templates select="root/node" mode="upper">
    <xsl:sort select="."/>
  </xsl:apply-templates>
  <xsl:apply-templates select="root/node" mode="lower">
    <xsl:sort select="."/>
  </xsl:apply-templates>
</xsl:template>

<xsl:template match="node" mode="lower">
  <xsl:if test=". = $lowercasenodes">
  <xsl:value-of select="."/><br/>
  </xsl:if>
</xsl:template>

<xsl:template match="node" mode="upper">
  <xsl:if test=". = $uppercasenodes">
    <xsl:value-of select="."/><br/>
  </xsl:if>
</xsl:template>

</xsl:stylesheet>

31.

Sorting on near numeric data

Jeni T and Mike Kay

In a transform, is it possible to correctly sort these poorly formed id's listed below

Currently my standard sort:

<xsl:apply-templates>
	<xsl:sort select="node()/@id"/>
</xsl:apply-templates>

Returns this:

<someNode id="CM09.1"/>
<someNode id="CM09.1.5"/>
<someNode id="CM09.10"/>
<someNode id="CM09.10.10.3"/>
<someNode id="CM09.10.15"/>
<someNode id="CM09.18.2"/>
<someNode id="CM09.2"/>
<someNode id="CM09.2.2"/>
<someNode id="CM09.22"/>
<someNode id="CM09.22.1"/>

it's the old classic... 1 then 10 before 2 etc.

I really need them sorted like the following:

<someNode id="CM09.1"/>
<someNode id="CM09.1.5"/>
<someNode id="CM09.2"/>
<someNode id="CM09.2.2"/>
<someNode id="CM09.10"/>
<someNode id="CM09.10.10.3"/>
<someNode id="CM09.10.15"/>
<someNode id="CM09.18.2"/>
<someNode id="CM09.22"/>
<someNode id="CM09.22.1"/>

I'm looking now to see if I can work this out and I was wondering if anybody would be able to help me with the correct sort selection.

The only other issue to be aware of is that the decimal points can go on indefinitely and I don't know until runtime the highest number in the any one id will be.

> In a transform, is it possible to correctly sort these poorly formed
> id's listed below

Tricky. I did have a recursive solution for you until I noticed that just because there's an ID CM09.18.2 doesn't mean that there's an ID CM09.18. This irregularity makes the task very difficult.

I think that I'd pick one of the following general approaches:

1. Decide that your stylesheet is only going to cope with IDs that have 5 components; or 10 components; or however many seems to be a reasonable maximum. You can always test the XML to make sure that this assumption holds and generate an error if it doesn't. But this allows you to do:

  <xsl:apply-templates select="someNode">
    <xsl:sort data-type="number"
      select="substring-before(
                substring-after(@id, '.'), '.')" />
    <xsl:sort data-type="number"
      select="substring-before(
                substring-after(
                  substring-after(@id, '.'), '.') '.')" />
    ...
  </xsl:apply-templates>

2. Create an extension function that can select the Nth component from an ID. Then create a recursive template that groups and sorts the nodes based on their Nth component.

3. Have a pre-processing phase that changes the IDs such that the number in each component of the ID is formatted with an appropriate number of leading zeros. You will then be able to sort the nodes by ID using alphabetical sorting.

4. Generate the stylesheet dynamically based on the data, creating a stylesheet that contains the appropriate number of sorts for the depth of the IDs that you encounter in the XML. Jeni

Mike offers

If you're prepared to write some recursive XSLT code to transform the keys, you could achieve this by the technique of prefixing each numeric component with a digit indicating its length. Thus 1 becomes 11, 10 becomes 210, 15 becomes 215, 109 becomes 3109. This will give you a key that collates alphabetically.

32.

Sort by date

Jarno.Elovirta




> I need to sort records in my XSL stylesheet descending by 
> date (i.e. newest
> date first).

> The problem is that the dates are in a text field in 
> localized form, i.e.

> <date>24. April 2003</date>

> I have no clue how to approach this or if it will be possible at all.

<xsl:sort data-type="number" order="descending" 
select="concat(substring-after(substring-after(., ' '), ' '),
format-number(document('')/*/x:months/month[@name =
substring-before(substring-after(current(), ' '), ' ')]/@number, '00'),
format-number(substring-before(., '.'), '00'))" />

with

<x:months>
  <month name="January" number="1" />
  <month name="February" number="2" />
  <month name="March" number="3" />
  <month name="April" number="4" />
  <month name="May" number="5" />
  <month name="June" number="6" />
  <month name="July" number="7" />
  <month name="August" number="8" />
  <month name="September" number="9" />
  <month name="October" number="10" />
  <month name="November" number="11" />
  <month name="December" number="12" />
</x:months>

as a top-level element in your stylesheet.

33.

Sorting problems

David Carlisle


<vendor_name>AAAAAAAA</vendor_name>
<vendor_name>
ZZZZZZZ
</vendor_name>

ZZZZZ would come before AAAAAAAA. The sort was being performed by IE 6.0. After much hair pulling, I finally figured out it was because of the carriage return that preceded the ZZZZZZ (the actual XML doc was much bigger, hiding the problem).

First question: It seems odd to me that the newline character would be considered significant and not get stripped. Why is this not so?

Answer

newlines that are followed by non-white space characters are _never_ considered insignificatnt by XML or stripped by XSLT.

(Mike Kay adds; But they may be ignored when sorting - the spec leaves detailed decisions on how strings are sorted to the implementation.)

You want the normalize-space function,

select="normalize-space(vendor_name)"

34.

How do I sort Hiragana and Katakana Japanese characters?

Eric Promislow

First, a note on what exactly these two kinds of characters are. Written Japanese text uses four kinds of characters:

  • Kanji, the so-called Chinese characters. These are originally based on written Chinese text.

  • Hiragana, used for writing out Japanese words phonetically, as opposed to in Kanji.

  • Katakana, a syllabary that indicates that a word or phrase has been borrowed from a non-Japanese alphabet.

  • Romaji, Roman characters.

Children's books often put small Hiragana characters below a Kanji character, so the student can subvocalize the word and learn the Kanji that way.

The Hiragana and Katakana alphabets are called "syllabaries". With one exception, each member is either a single-vowel syllable or a consonant-vowel syllable. The exception is "-n", as in "san", which would be written [sa][n]. Some of the syllables can be complex, such as the "kyo" in [To][kyo].

Both syllbaries follow the same order, following the a-i-u-e-o (vocalized as short 'u', long 'e', 'oo' as in "shoe", short e, semi-long o as in "beau") form horizontally, and the a-ka-sa-ta-na ha-ya-ma-wa-n order vertically (consonants like the hard g, d, sh, ch (as in chew), b, and p come by modifying the so-called unvoiced consonants). Modified syllables sort immediately after their base syllable. For example, "ga" sits between "ka" and "ki".

Hiragana characters occupy Unicode code points Ux3042 - Ux3094.

Katakana characters occupy Unicode code points Ux30A0 - Ux30FF.

The Katakana alphabet is growing, at a slow rate, and contains syllables that are not in the Hiragana table.

From what I know, the Unicode tables follow Japanese dictionary sorting order, <i>as long as you stay within either the Hiragana or Katakana table</i>. If all the items in your list are either one or the other, you should be able to use XSLT's simple Unicode-based xslt:sort element. Otherwise, you would need to write an extension function that would map Hiragana characters to their corresponding Katakana values (since the former is a proper subset of the latter).

Here's an example, where I have a list of Japanese characters. The attribute "a" is used to indicate where I would expect each item to appear in a sort based on Unicode-values only.

The input:


<?xml version="1.0">
<items>
<item a='k5'>&#12532;&#12449;</item>
<item a='h3'>&#12377;</item>
<item a='h4'>&#12418;</item>
<item a='k2'>&#12461;</item>
<item a='h1'>&#12363;</item>
<item a='k4'>&#12514;</item>
<item a='k1'>&#12459;</item>
<item a='h2'>&#12365;</item>
<item a='k3'>&#12473;</item>
</items>

The XSLT:


<?xml version="1.0"?> 
<xslt:stylesheet xmlns:xslt="http://www.w3.org/1999/XSL/Transform"
version="1.0" >

<xslt:output indent='yes' method='xml' encoding='utf-8' />
  
<xslt:template match='items'>
    <outitems what='Starting sorting'>
        <xslt:apply-templates select='item'>
            <xslt:sort select='.'/>
        </xslt:apply-templates>
    </outitems>
</xslt:template>

<xslt:template match='item'>
    <outitem>
        <xslt:attribute name='ord'>
	    <xslt:value-of select='@a'/>
        </xslt:attribute>
        <xslt:value-of select='.'/>
    </outitem>
</xslt:template>

</xslt:stylesheet>

The output:


<?xml version="1.0" encoding="utf-8"?>
<outitems what="Starting sorting">
<outitem ord="h1">&#12363;</outitem>
<outitem ord="h2">&#12365;</outitem>
<outitem ord="h3">&#12377;</outitem>
<outitem ord="h4">&#12418;</outitem>
<outitem ord="k1">&#12459;</outitem>
<outitem ord="k2">&#12461;</outitem>
<outitem ord="k3">&#12473;</outitem>
<outitem ord="k4">&#12514;</outitem>
<outitem ord="k5">&#12532;&#12449;</outitem>
</outitems>

If you are doing any work in this area, Ken Lunde's book "CJKV Information Processing" (ISBN 1565922247) is a worthwhile investment. I supplement it with a copy of Unicode 3.0 I found at a local discount store for remaindered computer books. The unicode.org site is useful, too, but I prefer turning pages in hard copy to waiting for PDF files to open.

But XML data is supposed to all be Unicode. C# supposedly stores all chars as "Unicode" (whatever that means, probably 16-bit ignoring issues with surrogates), so I'm surprised this sort didn't occur. Sort works fine with ascii text. And it seems to work when I mix Ascii and Japanese. Looks like I found a boundary condition violation. I wouldn't call it processor-specific.

Ednote: Saxon on sourceforge indicates how saxon extends the sorting capability to other languages.

35.

Sorting problems.

David Carlisle

The problem was that when sorting on this XML snippet using XSL:

<vendor_name>AAAAAAAA</vendor_name>
<vendor_name>
ZZZZZZZ
</vendor_name>

ZZZZZ would come before AAAAAAAA.  The sort was being performed by IE 6.0.
After much hair pulling, I finally figured out it was because of the
carriage return
that preceded the ZZZZZZ (the actual XML doc was much bigger, hiding the
problem).

First question: It seems odd to me that the newline character would be
considered significant and not get stripped.  Why is this not so? 

newlines that are followed by non-white space characters are _never_ considered insignificatnt by XML or stripped by XSLT.

[
But they may be ignored when sorting - the spec leaves detailed
decisions on how strings are sorted to the implementation.

Michael Kay
]

You want the normalize-space function,

<xsl:value-of select="normalize-space(source)"/>

36.

Sorting problems with whitespace (wrong order ).

David Carlisle

The problem was that when sorting on this XML snippet using XSL:

<vendor_name>AAAAAAAA</vendor_name>
<vendor_name>
ZZZZZZZ
</vendor_name>

ZZZZZ would come before AAAAAAAA.  The sort was being performed by IE 6.0.
After much hair pulling, I finally figured out it was because of the
carriage return
that preceded the ZZZZZZ (the actual XML doc was much bigger, hiding the
problem).

First question: It seems odd to me that the newline character would be
considered significant and not get stripped.  Why is this not so? 

newlines that are followed by non-white space characters are _never_ considered insignificatnt by XML or stripped by XSLT.

[
But they may be ignored when sorting - the spec leaves detailed
decisions on how strings are sorted to the implementation.

Michael Kay
]

You want the normalize-space function,

<xsl:value-of select="normmalize-space(vendor_name)"/>

37.

Sorting Upper-Case first, or in *your* way

Mike Kay et al

I don't know exactly what the intent of the XSLT 1.0 spec for case-order was, but you need to read the definition in the light of the two (non-normative) notes that follow it.

The first says that two implementations may produce different results - in other words, the spec does not attempt to be completely prescriptive about the output order (therefore, by definition, this is not a Microsoft non-conformance).

The second note points to Unicode TR-10: http://www.unicode.org/unicode/reports/tr10/index.html

Section 6.6 of this report recommends that implementations should allow the user to decide whether lower-case should sort before or after upper-case, and my guess is that the xsl:sort parameter was intended to implement this recommendation.

In turn this should be read in the context of the collation algorithm given in the report, which sorts strings in three phases:

- alphabetic ordering
- diacritic ordering
- case ordering

The key thing here is that case is only considered if the two strings (as a whole) are the same except in case. So Xaaaa will sort before xaaaa if upper-case comes first; but Xaaaa will always sort before xaaab, regardless of case order.

It looks to me from this evidence as if Microsoft is implementing something close to the Unicode TR10 algorithm.



> Of course XSLT 1.0 doesn't actualy define "lexicographic" but 
> my understanding is that it always implies a direct extension 
> on an ordering on characters to an ordering on strings by 
> comparing the first different position. If that isn't what is 
> intended I think XSLT shouldn't use this term and should just 
> directly refer to TR10.

My dictionary defines "lexicographical" [sic] as "pertaining to the making of dictionaries", so on that basis "lexicographic order" means "the order that headwords might appear in a dictionary". And in my dictionary, "Johnsonian" comes after "johnny" and before "joie-de-vivre". I think the great man would have been surprised if he had appeared before "a" or after "zymotic".

I know that the word lexicographic is also used to describe a class of sorting algorithms, but I don't think the XSLT 1.0 spec is using the word in that sense. This is clear from the phrase "lexicographically in the culturally correct manner for the language..." and from the fact that it recommends Unicode TR10, which is not a lexicographic sort in that sense.

David C adds. See for example the definition given here: http://mathworld.wolfram.com/LexicographicOrder.html

Note that (despite the etymology) "lexicographic order" doesn't necessarily mean "the order used in a dictionary" as dictionaries are compiled by human compilers and words can appear in whatever order the compiler chooses which may reflect personal and culturalpreferences as much as logic. However lexicographic ordering is used in a technical sense as a method of extending the ordering on one set (the alphabet) to a derived set (strings over that alphabet). I don't believe that the first note authorises this behaviour. it does not give a blanket licence to produce any result, it is an observation that because character order is language and system dependent the resulting lexicographic ordering will be too. The exact places that are system dependent are listed in the normative text above.

MK continues

The second note points to Unicode TR-10: http://www.unicode.org/unicode/reports/tr10/index.html

The key thing here is that case is only considered if the two strings (as a whole) are the same except in case.

DC retorts. You mean that this is a feature of the algorithm in TR-10 (I didn't follow it closely enough to derive this property just now)?

Of course XSLT 1.0 doesn't actualy define "lexicographic" but my understanding is that it always implies a direct extension on an ordering on characters to an ordering on strings by comparing the first different position. If that isn't what is intended I think XSLT shouldn't use this term and should just directly refer to TR10.

Yours truly adds:

With more than a little help from Eliot, below is a means of providing your own sort order. Its Saxon specific, tested with 6.5.2. Sorry.

<?xml version="1.0" encoding="utf-8" ?>
<doc>
 <word>Hello</word>
 <word>hello</word>
 <word>[hello]</word>
<word>Mword</word>
<word>Nword</word>

<word>Mother</word>
<word>Nato</word>
<word>:Mother</word>
<word>5Mother</word>
<word>&#x00FC;nter</word>
<word>unter</word>
<word>!unter</word>
<word>$unter</word>

</doc>

xslt:

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:saxon="http://icl.com/saxon"
 
                version="1.0">

  
  

     <xsl:output method="xml" encoding="utf-8"/>



  <xsl:template match="/">
    <html>
      <head>
        <title>Collation</title>
        <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=utf-8"
/>
      </head>
      <body>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>


  <xsl:template match="doc">

    <h3>Using an external collator class that implements
java.util.comparator, with the rules in an external text file.</h3>

    <p>The rules file is straightforward, with a couple of notable
exceptions. </p>

    <ol>
      <li>N sorts before M</li>
      <li>punctuation sorts before alpha which sorts before Numeric</li>
    </ol>


    <p>
      Before: <br />
 

  <xsl:for-each select="word">
      <xsl:value-of select="."/> <br />
  </xsl:for-each>
</p>
<hr />

  <p>
    <b>After</b> <br />


  <xsl:for-each select="word">
    <xsl:sort select="."  
     data-type="text"
     lang="er"/>
    <xsl:value-of select="."/> <br />
  </xsl:for-each>

</p>

  </xsl:template>



  <xsl:template match="*">
    *****<xsl:value-of select="name(..)"/>/<xsl:value-of
select="name()"/>******
</xsl:template>


</xsl:stylesheet>

Note the 'lang' attribute. This sends saxon off looking for a class com.icl.saxon.sort.Compare_er

which provides the necessary items. (note, the default is _en, the english 'sort')

For convenience, the actual sort order is kept externally, read in at runtime from a text file, as utf-8

For this test case it reads,

  ' ' ,  ':' ,  ';' ,  '<'  , '=' ,  '>'  , '?' ,  '@', '!',
 '[' ,  '\' ,  ']' ,  '^' ,  '_' ,  '`',
 '{' ,  '|' ,  '}' ,  '~'
 '!' ,  '"' ,  '#' ,  '$' ,  '%' ,  '&',  ''' ,  '(' ,  ')' ,  '*'  , '+'  ,
','  , '-' , '.' ,  '/'

< 'A',a  < 'B',b  < 'C',c  < 'D',d  < 'E',e  < 'F',f  < 'G',g
< 'h' < 'H'  < 'I',i  < 'J',j  < 'K',k  < 'L',l  < 'N',n < 'M',m
< 'O',o  < 'P',p  < 'Q',q  < 'R',r  < 'S',s  < 'T',t
< 'U',u < Ã¼  < 'V',v  < 'W',w  < 'X',x  < 'Y',y  < 'Z',z
< '0'  < '1'  < '2'  < '3'  < '4'  < '5'  
< '6'  < '7'  < '8'  < '9'

Two basic blocks. First up to the first < sign. These are ignorable. Next the 'sequence' of characters, e.g. < 'A',a < 'B',b implying that A sorts before B.

Note < 'N',n < 'M',m which is the test case. I.e. N should sort before M.

This lot is held in file collator.txt

Spec is http://java.sun.com/products/jdk/1.2/docs/api/java/text/RuleBasedCollator.html

The java is below.

package com.icl.saxon.sort;

import java.text.Collator;
import java.text.RuleBasedCollator;
import java.text.ParseException;
import java.lang.StringBuffer;
import java.io.FileReader;
import java.io.BufferedReader;


import java.io.Serializable;
import com.icl.saxon.sort.TextComparer;
import java.io.File;



/**
  * Custom Saxon collator implementation.
  **/


public class Compare_er extends com.icl.saxon.sort.TextComparer {

     static Collator collator;

    //static final String collatorRules = "< a < b < c";
     // String containing collation rules as defined by Java
     // RulesBasedCollator class. This could come from an
     // external resource of some sort, including from a Java
     // property or read from an application-specific configuration
     // file.

     public  Compare_er() {
         super();
	 String rulesFile="collator.txt";
         try {
             collator = new RuleBasedCollator(getRules(rulesFile));
         } catch (Exception e) {
             e.printStackTrace(); // Saxon will not report an exception
thrown at this point
         }
     }

     public int compare(java.lang.Object obj, java.lang.Object obj1) {
         return collator.compare(obj, obj1);
     }

 /**
     *Read a set of rules into a String
     *@param filename  name of the file containing the rules
     *@return String, the rules
     *
     **/
    private static String getRules(String filename) {
	String res="";
	try{
	    BufferedReader reader = 
		new BufferedReader (new FileReader (filename));
	    StringBuffer buf=new StringBuffer();
	    String text;
	    try {
		while ((text=reader.readLine()) != null)
		    buf.append(text + "\n");
		reader.close();
	    }catch (java.io.IOException e) {
		System.err.println("Unable to read from rules file "+
filename);
		System.exit(2);
		
		}
	    res=buf.toString();


	}catch (java.io.FileNotFoundException e) {
	    System.out.println("Unable to read Rules file, quitting");
	    System.exit(2);
	}
	
	return res;
    }// end of getRules()

}

Note the read from the rules file. (Also that if its not found, saxon doesn't report the error)

Finally, testing with

<doc>
 <word>Hello</word>
 <word>hello</word>
 <word>[hello]</word>
<word>Mword</word>
<word>Nword</word>
</doc>

gives output

Hello
hello
[hello]
Nword
Mword

I.e. the M and N are re-arranged.

Caution.

Assuming that the java file is in location com/icl/saxon/sort/Compare_er.java

then make sure that '.' is in the classpath, so it finds it.

With your own collator.txt file you can then sort text to your hearts content and to your own rules.

Its even easier in saxon 7, but that's another story.

Last word goes to Mike Kay.


> *It would be interesting to know how Saxon implements
> this behaviour..* if M. Kay will be kind to answer..

I thought you would never ask. I'm an optimist ;-)

The answer is different for Saxon 6.x and Saxon 7.x.

In Saxon 6.x, you can write your own collating functions as a plug-in, but if you don't, then two strings are compared as follows:

1. The two strings are compared with case normalized and accents stripped, using Unicode codepoint order of the normalized characters.
2. If step (1) finds that the strings are equal, they are compared with case normalized but without accents stripped, again using codepoint order.
3. If step (2) finds that the strings are equal, the outcome depends on the case of the first character that differs in the two strings, taking account of the case-order option on xsl:sort.

Case normalization relies on the Java method toLowerCase. Accent stripping is implemented only for characters in the upper half of the Latin-1 set.

The above is essentially a simplified implementation of the Unicode Collation Algorithm.

In Saxon 7.x, Saxon uses the collation capabilities of JDK 1.4. You can select any collation supported by the JDK. The default is selected according to your locale, or according to the language if lang is specified on xsl:sort. If case-order is upper-first, then the action of the selected Java collation is modified as follows: if the Java collation decides that two strings collate as equal, then Saxon examines the two strings, looking for the first character that differs between the two strings. If one of these is upper case, then that string comes first in the sorted order.

38.

Topological sort

Bill Keese and Dimitre Novatchev

Regarding the post from two years ago about topological sorting (Archive), here is another approach that I came up with. To me it seems to be more in the spirit of XSLT, ie, writing functionally rather than procedurally. Tell me what you think.

Topological sort refers to printing the nodes in a graph such that you print a node before you print any nodes that reference that node. Here's an example of a topologically sorted list:

        <element id="bar"/>
        <element id="bar2"/>
        <element id="foo">
            <idref  id="bar"/>
        </element>

My algorithm is simply:

1. each node gets a weight which is greater than the weight of any nodes it references
2. sort by weight

The algorithm is O(n^2) for a simple XSLT processor, but it would be O(n) if the XSLT processor was smart enough to cache the values returned from the computeWeight(node) function. Does saxon do this? Maybe it would if I used keys.

Here is the code. Note that it's XSLT V2 (although it could be written more verbosely in XSLT V1).

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:bill="http://bill.org"
    version="2.0">

Here's the code to compute the weight of a node (This code doesn't detect circular dependencies, but it should be easy to add. That's left as an exercise to the reader. :-)

    <xsl:function name="bill:computeWeight" as="xs:integer*">
        <xsl:param name="node"/>
        <!-- generate a sequence containing the weights of each node I 
reference -->
        <xsl:variable name="referencedNodeWeights" as="xs:integer*">
            <xsl:sequence select="0"/>
            <xsl:for-each select="$node/idref/@id">
                <xsl:sequence>
                    <xsl:value-of 
select="bill:computeWeight(//element[@id=current()])"/>
                </xsl:sequence>
            </xsl:for-each>
        </xsl:variable>
        <!-- make my weight higher than any of the nodes I reference -->
        <xsl:value-of select="max($referencedNodeWeights)+1"/>
    </xsl:function>

Here's the driver code, that sorts the elements according to their weight.

    <xsl:template match="/">
        <xsl:for-each select="top/element">
            <xsl:sort select="bill:computeWeight(.)" data-type="number"/>
            <xsl:copy-of select="."/>
        </xsl:for-each>
    </xsl:template>

See archive 1), archive 2 and archive 3 The latter is a stable topological sort -- "keeps the cliques together".

39.

Sorting on names with and without spaces

Michael Kay


> I am attempting to sort an a list of personal names. All of the names 
> consist of either a first name followed by a last name or of a last 
> name only (there are no middle names). Both parts of the name, when 
> present, are enclosed within the one tag (span) which has a 
> class='person'
> attribute, the same tag is used to enclose a last name only. I am 
> attempting to sort by last name like so

> <xsl:for-each select="html/body//span[@class='person']">
> <xsl:sort select="substring-after(., ' ')"/> <xsl:sort select="."/> 
> <xsl:sort select="substring-before(., ' ')"/>

> The problem is that names consisting of a last name only appear first 
> in my alphabetical sequence and are sorted; these are followed by 
> names with a first name and a last name and these are also sorted. I 
> require one alphabetical list rather than two.

> Can this be done in one fell swoop, without having to write an XSL 
> style sheet for the file consisting of two alphabetical sequences?

As is so often the case, it's easy in 2.0:

<xsl:sort select="if (contains(., ' ')) 
    then substring-after(., ' ') 
     else string(.)"/>

The only workaround I can think of for 1.0 is the "infinite-substring" hack. This relies on the fact that if B is a boolean expression, then

substring(X, 1 div B, string-length(X))
returns [if (B) then X else ""] 

So you get something like

select="concat(
   substring(., 1 div (not(contains(., ' ')), string-length(X),
   substring(substring-after(., ' '), 1 div contains(., ' '), string-length(X)))"/>

Dimitre offers

This xml

<names>
  <span class="person">Jenofsky</span>
  <span class="person">Jones</span>
  <span class="person">Zubbard</span>
  <span class="person">Bob Madison</span>
  <span class="person">Oscar Madison</span>
  <span class="person">Felix Unger</span> 
</names> 
<xsl:template match="names">
  <xsl:apply-templates select="span">
    <xsl:sort select="substring(translate(., ' ', ''),  string-length(substring-before(.,' '))  + 1) "/> 
    </xsl:apply-templates>
 </xsl:template>

This calculates the index of the first space, then removes the
space(s) from the string, then uses what follows the space as the sorting key.

and Mike Kay the 2.0 solution

 <xsl:template match="names">
  <xsl:apply-templates select="span">
           <xsl:sort select="if (contains(., ' ')) then 
                substring-after(., ' ') else string(.)" 
                data-type="text" case-order="upper-first"/>
                      
    </xsl:apply-templates>
 </xsl:template>