Sorting

1. Copying with Sorting
2. Topological sort
3. Sorting on names with and without spaces
4. Sort based on numbers and characters.
5. Indented output for human readability
6. Month lookup
7. Sorting, with variables
8. Sort a tokenized list

1.

Copying with Sorting

Michael Kay

XSLT 2.0 allows you to create a sorted sequence of nodes using the sort() function, with named sort keys:

<xsl:sort-key name="sk1">
  <xsl:sort select="exp2"/>
</xsl:sort-key>

<xsl:copy-of select="sort('sk1', exp)"/>

2.

Topological sort

Bill Keese and Dimitre Novatchev

Regarding the post from two years ago about topological sorting (Archive), here is another approach that I came up with. To me it seems to be more in the spirit of XSLT, ie, writing functionally rather than procedurally. Tell me what you think.

Topological sort refers to printing the nodes in a graph such that you print a node before you print any nodes that reference that node. Here's an example of a topologically sorted list:

        <element id="bar"/>
        <element id="bar2"/>
        <element id="foo">
            <idref  id="bar"/>
        </element>

My algorithm is simply:

1. each node gets a weight which is greater than the weight of any nodes it references
2. sort by weight

The algorithm is O(n^2) for a simple XSLT processor, but it would be O(n) if the XSLT processor was smart enough to cache the values returned from the computeWeight(node) function. Does saxon do this? Maybe it would if I used keys.

Here is the code. Note that it's XSLT V2 (although it could be written more verbosely in XSLT V1).

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:bill="http://bill.org"
    version="2.0">

Here's the code to compute the weight of a node (This code doesn't detect circular dependencies, but it should be easy to add. That's left as an exercise to the reader. :-)

    <xsl:function name="bill:computeWeight" as="xs:integer*">
        <xsl:param name="node"/>
        <!-- generate a sequence containing the weights of each node I 
reference -->
        <xsl:variable name="referencedNodeWeights" as="xs:integer*">
            <xsl:sequence select="0"/>
            <xsl:for-each select="$node/idref/@id">
                <xsl:sequence>
                    <xsl:value-of 
select="bill:computeWeight(//element[@id=current()])"/>
                </xsl:sequence>
            </xsl:for-each>
        </xsl:variable>
        <!-- make my weight higher than any of the nodes I reference -->
        <xsl:value-of select="max($referencedNodeWeights)+1"/>
    </xsl:function>

Here's the driver code, that sorts the elements according to their weight.

    <xsl:template match="/">
        <xsl:for-each select="top/element">
            <xsl:sort select="bill:computeWeight(.)" data-type="number"/>
            <xsl:copy-of select="."/>
        </xsl:for-each>
    </xsl:template>

3.

Sorting on names with and without spaces

Michael Kay


> I am attempting to sort an a list of personal names. All of the names 
> consist of either a first name followed by a last name or of a last 
> name only (there are no middle names). Both parts of the name, when 
> present, are enclosed within the one tag (span) which has a 
> class='person'
> attribute, the same tag is used to enclose a last name only. I am 
> attempting to sort by last name like so

> <xsl:for-each select="html/body//span[@class='person']">
> <xsl:sort select="substring-after(., ' ')"/> <xsl:sort select="."/> 
> <xsl:sort select="substring-before(., ' ')"/>

> The problem is that names consisting of a last name only appear first 
> in my alphabetical sequence and are sorted; these are followed by 
> names with a first name and a last name and these are also sorted. I 
> require one alphabetical list rather than two.

> Can this be done in one fell swoop, without having to write an XSL 
> style sheet for the file consisting of two alphabetical sequences?

As is so often the case, it's easy in 2.0:

<xsl:sort select="if (contains(., ' ')) 
    then substring-after(., ' ') 
     else string(.)"/>

This xml

<names>
  <span class="person">Jenofsky</span>
  <span class="person">Jones</span>
  <span class="person">Zubbard</span>
  <span class="person">Bob Madison</span>
  <span class="person">Oscar Madison</span>
  <span class="person">Felix Unger</span> 
</names> 
<xsl:template match="names">
  <xsl:apply-templates select="span">
           <xsl:sort select="if (contains(., ' ')) then 
                substring-after(., ' ') else string(.)" 
                data-type="text" case-order="upper-first"/>
                      
    </xsl:apply-templates>
 </xsl:template>

4.

Sort based on numbers and characters.

Andrew Welch

With this input

<doc>
<n>01</n>
<n>10</n>
<n>Alpha</n>
<n>30</n>
<n>100</n>
<n>2</n>


</doc>

With this styleseet

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:strip-space elements="*"/>
   <xsl:output method="xml" indent="yes"/>


   <xsl:template match="/doc">
     <xsl:apply-templates>
       <xsl:sort select="replace(., '\s\D*', '')" data-type="number"/>
       <xsl:sort select="replace(., '\d*\s', '')" data-type="text"/>

     </xsl:apply-templates>
   </xsl:template>


   <xsl:template match="n">
     <xsl:message>
       <xsl:value-of select="replace(.,'^ *0+','')"/>
     </xsl:message>
   </xsl:template>

</xsl:stylesheet>

5.

Indented output for human readability

David Carlisle

If I needed more control in a pure xslt way I'd probably do a 2nd pass over the result tree either as a 2nd transform or by using a variable and then a special mode to apply templates, something like

<xsl:template match="*" mode="indent">
<xsl:text>&#10;</xsl:text>
<xsl:for-each 
  select="ancestor::*"><xsl:text>  </xsl:text></xsl:for-each>
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates mode="indent"/>
<xsl:text>&#10;</xsl:text>
<xsl:for-each 
  select="ancestor::*"><xsl:text>  </xsl:text></xsl:for-each>
<xsl:copy>
</xsl:template>

Then you can ealy add special templates

<xsl:template match="foo" mode="indent">
<xsl:text>&#10;&#10;</xsl:text>
<xsl:comment>double spaced</xsl:comment>
<xsl:for-each select="ancestor::*"><xsl:text>  </xsl:text></xsl:for-each>
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates mode="indent"/>
<xsl:text>&#10;</xsl:text>
<xsl:for-each select="ancestor::*"><xsl:text>  </xsl:text></xsl:for-each>
<xsl:copy>
</xsl:template>

This way the indenting doesn't intefere with your main transform and can easily be put in a common stylesheet that you include when needed.

6.

Month lookup

David Carlisle

I have for quite a few years used this primitive way of translating from a month number to a month name.

<xslt:variable name="$months">
 <m>January</m><m>February</m><m>March</m><m>April</m>
 <m>May</m><m>June</m><m>July</m><m>August</m>
 <m>September</m><m>October</m><m>November</m><m>December</m>
</xsl:variable>

The actual lookup was done using a this expression

<xsl:value-of
          select="$months/m[number($month)]"/>

Where $month can be the string 1 .. 12.

In XSLT 2.0 you can use

In xslt2 you could also (more efficiently) do

<xslt:variable name="months" as="item()*" select=
"('January','February','March','April','May','June','July','August',
'September','October','November','December')"/>
<xsl:value-of select="$months[xs:integer($month)]"/>

7.

Sorting, with variables

Abel Braaksma



My requirement is that I have an unknown number of sort conditions
that I want to apply to a for-each, for-each-group, or apply-templates
construct.

Is there a way to dynamically define the number of <xsl:sort>
statements to use based on selecting parameters from an xml?

For example, <list> might have 0 to n <sort/> nodes:

<list>
 <address>
   <name/>
   <street/>
   <city/>
 </address>
 <!-- more address nodes -->

 <sort order="1">name</sort>
 <sort order="2">city</sort>
<!-- might have more sort nodes -->
</list>

If I have no nodes, I would want to omit the <xsl:sort> statement, etc...

Answer

The easy part is the number of sort statements. Just limit the amount to, say, 10, and create 10 sort statements. Then:

1. The 'order' and 'data-type' are AVT, so you can easily get them from any variable or data structure. Likely something like: $sortkeys/sort[1]/order etc. This will give you fine granularity control over descending/ascending, numeric/string. Keep in mind that it is an error when the order or data-type evaluates to empty, so make sure to use defaults in that case.

2. The select statements are a bit tricky. You can leave them simple if you only need to sort on one node with a given name (from your sortkey) and use local-name() or name() functions to match for it. You can evaluate an xpath (not sure saxon:evaluate is the way to go) or you can decide to create a simple function that takes the current node and resolves your (simplified) xpath if it is more then just a node name.

3. The order of the sortkeys is something you can resolve in several XSLT native ways. That shouldn't be that hard.

Combined, this looks something like this:

<!-- order of these elements is the order for the sort-key -->
<xsl:variable name="sortkey">
  <key nodename='name' order='ascending' type='string' />
  <key nodename='street' order='ascending' type='string' />
  <key nodename='birth-year' order='ascending' type='numberic' />
</xsl:variable>

<xsl:apply-templates select="somenode">
   <xsl:sort select="*[local-name() = $sortkey/key[1]/@nodename"
       order="{$sortkey/key[1]/@order}"
data-type="{$sortkey/key[1]/@type}"/>
   <xsl:sort select="*[local-name() = $sortkey/key[1]/@nodename"
       order="{$sortkey/key[2]/@order}"
data-type="{$sortkey/key[2]/@type}"/>
   <xsl:sort select="*[local-name() = $sortkey/key[1]/@nodename"
       order="{$sortkey/key[3]/@order}"
data-type="{$sortkey/key[3]/@type}"/>
.... etc (10x) ...
</xsl:apply-templates>

You can generalize this to some further extend, of course.

If you want even more control, you should consider making a two-pass transformation where in the first pass you create the XSLT that contains the correct sortkeys. FXSL has many examples of how to do multi-pass on XSLT alone.

Andrew Welch offers

Here's a way of sorting dynamically without extensions:

<?xml version="1.0"?>
<xsl:stylesheet version="2.0"
xmlns:xs="http://www.w3.org/2001/XMLSchema";
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";
xmlns:fn="fn">

<xsl:variable name="employees" select="/employees/employee"/>
<xsl:variable name="sortValues" select="/employees/sort" as="xs:string+"/>

<xsl:function name="fn:performSort" as="element()+">
   <xsl:param name="list" as="element()+"/>
   <xsl:param name="sortKey" as="xs:string+"/>

   <xsl:variable name="sortedList" as="element()+">
       <xsl:for-each select="$list">
           <xsl:sort select="*[name() = $sortKey[1]]"/>
           <xsl:copy-of select="."/>
       </xsl:for-each>
   </xsl:variable>

   <xsl:sequence select="if ($sortKey[2]) then
                           fn:performSort($sortedList,
subsequence($sortKey, 2))
                           else $sortedList"/>
</xsl:function>

<xsl:template match="/">
   <xsl:sequence select="fn:performSort($employees, $sortValues)"/>
</xsl:template>

</xsl:stylesheet>

You call the performSort() function with a list of elements to sort, and a list of child element names to sort with, and it recursively sorts by each one.

8.

Sort a tokenized list

David Carlisle




I've been attempting to write a template which sorts a tokenized list of
pairs of numbers with little success.

Here's what I've been attempting to do:

Given a string like:
"1 0 2 1 1 2 1 3 1 4 2 0 1 1 2 3 2 4 6 0 5 0 10 0 10 1 10 2"

I want to tokenize and build pairs like:

"1 0", "2 1", "1 2", "1 3", "1 4", "2 0", "1 1", "2 3", "2 4", "6 0", "5
0", "10 0", "10 1", "10 2"

answer


xsl:stylesheet version="2.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

%lt;xsl:param name="s" select="'1 0 2 1 1 2 1 3 1 4 2 0 1 1 2 3 2 4 6 0 5 0 10 0 10 1 10 2'"/>

%lt;xsl:template name="main">
%lt;xsl:for-each-group select="tokenize($s,'\s+')" group-by="(position()-1) idiv 2">
  %lt;xsl:sort select="number(current-group()[1])"/>
  %lt;xsl:sort select="number(current-group()[2])"/>
: %lt;xsl:value-of select="current-group()"/>
%lt;/xsl:for-each-group>
%lt;/xsl:template>

%lt;/xsl:stylesheet>

$ saxon9 -it main sort.xsl
%lt;?xml version="1.0" encoding="UTF-8"?>
: 1 0
: 1 1
: 1 2
: 1 3
: 1 4
: 2 0
: 2 1
: 2 3
: 2 4
: 5 0
: 6 0
: 10 0
: 10 1
: 10 2