Cross reference in xslt

Cross Reference

1. Numbering lines
2. Numbered cross references
3. Cross Referencing between multiple output HTML files
4. Handling punctuation in a cross reference
5. Cross referencing, using keys
6. Testing for matching id, idrefs

1.

Numbering lines

James Cummings, David Carlisle



I have some poems marked up something like:
----Input----
<body>
<div type="poem">
<lg type="stanza">
<l>This is a line of verse</l>
<l>This is a line of verse</l>
<l>This is a line of verse</l>
<l>This is a line of verse</l>
</lg>
<lg type="stanza">
<l>This is a line of verse</l>
<l>This is a line of verse</l>
<l>This is a line of verse</l>
<l>This is a line of verse</l>
</lg>
<p>This is something not counted as a line</p>
<lg type="stanza">
<l>This is a line of verse</l>
<l>This is a line of verse</l>
<l>This is a line of verse</l>
<l>This is a line of verse</l>
</lg>
</div>
<!-- and multiple poem div's like this here... -->
</body>

What I want to end up with is for each line to be 
given an @id in the html output like:

<span id="poem3line10">This is a line of verse <span class="number">10</span></span>

a) that the linenumbering needs to take account of 
the existence of the line-groups (lg) and that it 
should only count back within it's own <div> ancestor.
b) that I want to number the lines every 5 lines. 

James Cummings: What I want to produce is a word-index of poem number and line number, something like:

a (4) -- 1:1, 1:2, 1:3, 1:4, 2:3, 2:5 (well, no poem 2 here ;-) )
be (5) -- 1:head, 1:1, 1:2, 1:3, 1:4
...
really (2) -- 1:1, 1:3, 2:1, 2:3 (if it was in poem 2 as well)

----Input----
<div type="poem">
<head>headers should be included in word index</head>
<lg>
<l>This is a line that really should be included</l>
<l>This is a line that should be included</l>
</lg>
<p>This shouldn't be included</p>
<lg>
<l>This is a line that really should be included</l>
<l>This is a line that should be included</l>
</lg>
</div>
... (more poems)
----

David Carlisle:

You really want to make yourself a tree first something like:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes"/>
<xsl:key name="w" match="word" use="."/>
<xsl:template match="/">
<xsl:variable name="x">
<xsl:apply-templates mode="a" select="div[@type='poem']"/>
</xsl:variable>
<!-- Just for debugging, copy of temporary tree -->
[
<xsl:copy-of  select="$x"/>
]
<!-- remove above in reality -->
<xsl:for-each-group select="$x/div/l/word" group-by=".">
 <xsl:sort />
  <xsl:text>&#10;</xsl:text>
  <xsl:value-of select="."/>
  <xsl:for-each select="key('w',.)">
  <xsl:text> </xsl:text>
  <xsl:value-of select="../../@poem"/>:<xsl:value-of select="../@n"/>
  </xsl:for-each>
</xsl:for-each-group>
</xsl:template>

<xsl:template mode="a" match="div">
<div poem="{position()}">
<xsl:apply-templates mode="a" select="head"/>
<xsl:apply-templates mode="a" select="lg/l"/>
</div>
</xsl:template>

<xsl:template mode="a" match="head">
<l n="head">
<xsl:for-each select="tokenize(.,'(\s|[,\.!])+')">
<word><xsl:value-of select="lower-case(.)"/></word>
</xsl:for-each>
</l>
</xsl:template>

<xsl:template mode="a" match="l">
<l n="{position()}">
<xsl:for-each select="tokenize(.,'\s+')">
<word><xsl:value-of select="."/></word>
</xsl:for-each>
</l>
</xsl:template>
</xsl:stylesheet>
----Output----
$ saxon8 poem.xml poem.xsl
<?xml version="1.0" encoding="UTF-8"?>
[

<div poem="1">
   <l n="head">
      <word>headers</word>
      <word>should</word>
      <word>be</word>
      <word>included</word>
      <word>in</word>
      <word>word</word>
      <word>index</word>
   </l>
   <l n="1">
      <word>This</word>
      <word>is</word>
      <word>a</word>
      <word>line</word>
      <word>that</word>
      <word>really</word>
      <word>should</word>
      <word>be</word>
      <word>included</word>
   </l>
   <l n="2">
      <word>This</word>
      <word>is</word>
      <word>a</word>
      <word>line</word>
      <word>that</word>
      <word>should</word>
      <word>be</word>
      <word>included</word>
   </l>
   <l n="3">
      <word>This</word>
      <word>is</word>
      <word>a</word>
      <word>line</word>
      <word>that</word>
      <word>really</word>
      <word>should</word>
      <word>be</word>
      <word>included</word>
   </l>
   <l n="4">
      <word>This</word>
      <word>is</word>
      <word>a</word>
      <word>line</word>
      <word>that</word>
      <word>should</word>
      <word>be</word>
      <word>included</word>
   </l>
</div>
]

a 1:1 1:2 1:3 1:4
be 1:head 1:1 1:2 1:3 1:4
headers 1:head
in 1:head
included 1:head 1:1 1:2 1:3 1:4
index 1:head
is 1:1 1:2 1:3 1:4
line 1:1 1:2 1:3 1:4
really 1:1 1:3
should 1:head 1:1 1:2 1:3 1:4
that 1:1 1:2 1:3 1:4
This 1:1 1:2 1:3 1:4
word 1:head
----

2.

Numbered cross references

Sebastian Rahtz

Q: Expansion:
I want to create a cross-reference to a certain <title>, 
but I  don't just want to take its content ("Introduction") - 
I want its number as  well, 
so my text comes out to "See Section 1.1 Introduction".  
How do I do  this?


suppose you have <ptr target="foo"> and
<title id="foo">Introduction</title>, then something like

 <xsl:template match="ptr">
  See Section <xsl:apply-templates mode="xref" select="id(@target)" />
 </xsl:template>

 <xsl:template mode="xref" match="title">
   <xsl:number/> 
   <xsl:apply-templates/>
 </xsl:template>

            

3.

Cross Referencing between multiple output HTML files

David Carlisle

If you use an attribute on "chapter" to form the HTML page name,

If you want to link to the id given by the ref attribute of the current node and know that the result will be in a file determined by the `file' attribute of chapter which is anancestor of that node, you can use something like

<a href="{id{@ref}/ancestor-or-self::chapter/@file}.html#{@ref}">

4.

Handling punctuation in a cross reference

Jeni+Kate Tennison



> I'm looking for some advice on how to deal with punctuation following 
> cross-references, where:
>
> - The cross-reference style should include surrounding quotes
> - The cross-reference may be followed by punctuation, but the
>    punctuation should be placed inside the closing quote.

I think that you're asking how to cope when you have something like:

  For more information, see Chapter 3, <xref linkend="xxx" />.

You want to replace the <xref> element with the relevant text, quoted, but want the full stop (period) at the end of the sentence to be included in the quotes.

To do this for simple cases, you need to test the first character of the text node that follows the <xref> to see if it's a punctuation character, and if it is, put it inside the quotes. You can set up a variable that holds a string containing the punctuation characters you're interested in:

<xsl:variable name="punctuation" select="'.?!,:;'" />

And, within the template matching the <xref> element, get the first character of the immediately following text node with:

<xsl:variable name="char"
  select="substring(following-sibling::text()[1], 1, 1)" />

You can test whether that character is punctuation by seeing if $punctuation contains $char, using:

  contains($punctuation, $char)

and if so, include $char within the quotes; something like:

  <xsl:text>"</xsl:text>
  <xsl:value-of select="key('xrefs', @linkend)" />
  <xsl:if test="contains($punctuation, $char)">
    <xsl:value-of select="$char" />
  </xsl:if>
  <xsl:text>"</xsl:text>

Of course, you don't want to have the punctuation character appear twice, so you also need to have a template that matches text nodes that appear immediately after <xref> elements:

<xsl:template match="text()[preceding-sibling::*[1][self::xref]]">
  ...
</xsl:template>

Inside this template, you need to test whether the first character is a punctuation character and, if so, output only the remaining characters in the text node:

<xsl:template match="text()[preceding-sibling::*[1][self::xref]]">
  <xsl:choose>
    <xsl:when test="contains($punctuation, substring(., 1, 1))">
      <xsl:value-of select="substring(., 2)" />
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="." />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

I said that this approach would work for simple cases. You might have more complex cases where the text node after the <xref> element is nested within another element. In that case, I suggest that you adopt the British style, and leave the punctuation outside the quotes -- it will make the stylesheet a lot simpler! ;)

5.

Cross referencing, using keys

Wendell Piez



>I'm developing a stylesheet that converts XML to html to display 
>research articles. The articles contains three citation types, 
>bibliographical, table call, and figure call. Upon encountering a table 
>call or figure call, I would like to display the table or figure 
>referred to immediately following the paragraph that contains the call. 
>I want the table or figure to appear in the order they were referred to 
>in the paragraph and I want each table or figure to only appear once in 
>the outputted document. Tables and figures are numbered in order of 
>their reference, though at any point you can refer to 
>a table or figure that has been previously called.
>
>Citations look like this:
><xref ref-type="bibr" rid="B1">1</xref> <xref ref-type="table" 
>rid="T1">Table 1</xref> <xref ref-type="fig" rid="F1">Figure 1</xref>
>
>Sample Input:
>
>[A paragraph that includes a citation for Table 1.] [A paragraph that 
>includes citations for Table 2, Table 1, Figure 1, and Table 3.]
>
>Sample Output:
>
>[A paragraph that includes a citation for Table 1.]
>
>Table 1
>
>[A paragraph that includes citations for Table 2, Table 1, Figure 1, 
>and Table 3.]
>
>Table 2
>Figure 1
>Table 3
>
>  My initial thought is to create a set of keys:
>Key: Last Table Processed
>Key: Last Figure Processed
>Key: Last Table Encountered
>Key: Last Figure Encountered
>
>Since the tables and figures are numbered in order, a comparison of the 
>two keys should be in order. This comparison should be made at the end 
>of processing a paragraph. However, I'm not quite sure how I'd make 
>such a comparison or even if I can use keys in that manner. I'm 
>thinking I might need to generate some sort of array to keep track of 
>the multiple citations encountered so that in the sample provided the 
>output is (Table 2, Figure 1, Table 3) and not (Table 2, Table 3, 
>Figure 1) or (Figure 1, Table 2, Table 3). If I were to build an array, 
>since at this point I don't need to process <xref> citations of "bibr" 
>type, those should be ignored. Any suggestions would be greatly appreciated. 


This problem is a bit difficult, not because of any inherent difficulty in any of the methods you will use, but because it's both complex, and will require using a couple of XSLT 1.0 tricks. Disentangling it shows a way forward.

You actually have several problems here:

* Assigning each table or figure its correct number (based on order of citation, not order of appearance in the source)

* Citing the tables or figures where xrefs appear in line, each with its correct number

* Placing the tables and figures each after the paragraph where it's first cited, and not elsewhere

You will use keys for this, though not perhaps in exactly the way you're imagining. Likewise, it'd be nice if we could construct an array, and in XSLT 2.0 we could (or at least the functional equivalent thereof), but in XSLT 1.0 we can't. So we have to fake a couple of things. As you'll see, this faking may potentially get us into a bit of trouble with performance. The usual XSLT 1.0 approach when this happens is to split a problem into two or more passes, which generally gives us opportunities to optimize for efficiency.

In order to simplify this explanation I'm going to assume you have only tables. Figures will work just the same:

Assigning each table or figure its number ... we could either do this by counting the references (filtering out for repeated references) or by counting the tables, sorted by their first reference. While the latter would be nice, the former is easier in XSLT 1.0. We do this first by giving us a means to filter out the repeats:

<xsl:key name="tablerefs-by-rid" match="xref[@ref-type='table']" use="@rid"/>

Given $rid, we can then get all the references to any table by calling

key('tablerefs-by-rid', $rid)

and the first one only by calling

key('tablerefs-by-rid', $rid)[1]

In addition, we can get all the first references by saying, e.g.

//xref[@ref-type='table'][count(.|key('tablerefs-by-rid', @rid)[1])=1]

This XPath traverses the entire document from the root, collecting all xrefs that are the first reference to their table. If they aren't to a table, the first predicate filters them out. If they are not a first reference, the count of their union with the first reference will be 2 not 1, and the second predicate will filter them out.

This uses an XPath 1.0 idiom (the count() trick) to test node identity. The generate-id() function is also sometimes used for this, so this would also work:

//xref[generate-id()=generate-id(key('tablerefs-by-rid', @rid)[1])]

Notice here we don't need the first predicate (since xrefs not to tables will also be thrown out by the predicate given -- so you may prefer this.

It would be very convenient to have all these particular nodes collected together so we don't have to collect them over and over (an expensive traversal). So:

<xsl:variable name="first-table-refs"
   select="//xref[generate-id()=generate-id(key('tablerefs-by-rid',
@rid)[1])]"/>

(This is awfully close to an array, isn't it?)

Consequently, we can also get the proper number for any given xref[@ref-type='table'] with the expression

count($first-table-refs
   [count(.|current()/preceding::xref) = count(current()/preceding::xref)]) + 1

which looks, and is, awfully obnoxious and expensive (using the costly preceding:: axis twice), but which can be optimized slightly as a template call:

<xsl:template match="xref" mode="assign-table-number">
   <xsl:for-each select="key('tablerefs-by-rid', @rid)[1]">
     <!-- switching context to the first reference 
                to this reference's table -->
     <xsl:variable name="preceding-refs" select="preceding::xref"/>
     <xsl:value-of select="count($first-table-refs
       [count(.|$preceding-refs) = count($preceding-refs)]) + 1"/>
     <!-- counting the first table references before this one, 
       and adding 1 -->
   </xsl:for-each>
</xsl:template>

I wish this were easier, but in XSLT 1.0 it just isn't. In 2.0, it is (and maybe Mike or Jeni or someone will show us how).

But it does solve problem 1, and you can see how any given xref[@ref-type='table'] can call <xsl:apply-templates select="." mode="assign-table-number"/> and get its number, thereby solving problem 2.

Problem 3 is a matter of selecting, after you create a paragraph, those references in it that are first references to their targets (tables, figures, what not), which again you can do (in the case of tables) using this same idiom:

<xsl:template match="para">
   <p>
     <xsl:apply-templates/>
   </p>
   <xsl:apply-templates mode="get-target"
    select=".//xref[generate-id()=generate-id(key('tablerefs-by-rid',
@rid)[1])]"/>
   <!-- do the same with any other keys for xrefs you have, e.g. to figures, perhaps
        unifying the select -->
</xsl:template>

To actually get the target you're going to need another key:

<xsl:key name="target-by-rid" match="table|figure" use="@id"/>

and then

<xsl:template match="xref" mode="get-target">
   <xsl:apply-templates select="key('target-by-rid', @rid)"
     mode="show"/>
</xsl:template>

which will go apply templates to the table, figure or whatever. Note I've put this call also in a special mode, "show", enabling you to say in the default mode

<xsl:template match="table|figure"/>

so the tables only come out where you actually want them.

Whew! not bad for a bit of work, eh?

This should work fine for input at the scale of most human-readable documents. For higher performance (that numbering is a beast), you'll want to split out an analytic/sorting pass before processing, or get out the big rotary saw (XSLT 2.0).

Note: I just typed this up, and haven't tested, but I have used such code and it works. Beware particularly of missing parentheses in my XPaths, etc.

6.

Testing for matching id, idrefs

David Carlisle

The problem is how to check a match between references and ID values


idrefs.xml

<!DOCTYPE x [
<!ELEMENT x (x*)>
<!ATTLIST x id ID #IMPLIED>
<!ATTLIST x ref IDREFS #IMPLIED>
]>
<x>
<x id="a"/>
<x id="b"/>
<x id="c"/>
<x ref=" a b "/>
<x ref="a  z c  "/>
</x>



idrefs.xsl

 

 <xsl:template match="x[@ref]">
  ===
  <xsl:variable name="x">
   <xsl:for-each select="id(@ref)">
    <xsl:value-of select="@id"/>
    <xsl:if test="position()!=last()"><xsl:text> </xsl:text></xsl:if>
   </xsl:for-each>
  </xsl:variable>
  <xsl:variable name="y" select="string-length(normalize-space(@ref))-string-length($x)"/>
  <xsl:choose>
   <xsl:when test="$y > 0"> ref="<xsl:value-of select="@ref"/>" has extra tokens, matching tokens are "<xsl:value-of select="$x"/></xsl:when>
  <xsl:when test="$y <= 0"> ref="<xsl:value-of select="@ref"/>" is lovely</xsl:when>
  </xsl:choose>
  ====
 </xsl:template>
 
</xsl:stylesheet>







$ saxon idrefs.xml idrefs.xsl
<?xml version="1.0" encoding="utf-8"?>




  ===
   ref="a b" is lovely
  ====
 

  ===
   ref="a z c" has extra tokens, matching tokens are "a c
  ====