XSLT and text nodes

Text Nodes

1. Matching text nodes
2. Selecting text child elements
3. Truncating output of a node
4. First child is a text node?
5. How to get the text of all children without whitespace
6. Finding quote and apostrophe in text
7. Hightlight a substring
8. Selecting the first sentence of a paragraph
9. Insert a character every nth character
10. Insert a character every nth character
11. Acronym and abbreviation substitution (OT?)
12. Removing non-alphanumeric characters from attribute
13. How to find the longest node in a node-set
14. How to wrap lines at n characters?
15. Matching text nodes and whitespace
16. Selecting last text() from fragment of unknown depth?
17. Selecting text nodes
18. Remove non numeric content
19. Search for a word in the entire input file
20. Access to text content
21. Removing characters from some elements

1.

Matching text nodes

Steve Muench


|   <xsl:template match="text()" >
| 
| nothing is matched, but when i do this in this way 
| ( AFAIK this does the same )
| 
|   <xsl:template match="*/text()" >
| 
| evrything is ok, i can understand why second works , but why first not ?
	

Hard to say without seeing the rest of your stylesheet's templates, but I'm going to guess that you have the following situation going on...

You probably have:

  <xsl:template match="text()">

then later in the stylesheet:

  <xsl:template match="node()|@*">

with an identity transformation template.

Since match="text()" has the same priority as match="node()", and since both text() and node() match a text node, the match="node()" will fire instead of the match="text()" since it occurs later in document order in the stylesheet. By your changing the match pattern to match="*/text()" this becomes more specific than match="node()" so this causes your match="*/text()" rule to "get noticed" by the XSLT engine where it previously was being ignored.

The solution is to change your template to have an explicit priority like this:

   <xsl:template match="text()" priority="2">

which makes it take priority over the match="node()" template.

2.

Selecting text child elements

Evan Lenz

There have been three possibilities discussed as shown below. A concrete example follows.

1) string-value of an element, which is the concatenation of all *descendant* text nodes (what Anthony tried)

<xsl:value-of select="."/> (where "." is the current node, an element)

2) string-value of first child text node (what you suggested)

<xsl:value-of select="text()"/>

3) string-value of each child text node (the correct solution to Anthony's problem)

<xsl:for-each select="text()">
  <xsl:value-of select="."/>
</xsl:for-each>
-OR-
<xsl:copy-of select="text()"/>

Example (where the foo element is the current node in each of the above):

<foo>The <bar>quick</bar> brown fox jumped <bat>over</bat> the lazy dog.</foo>

#1 will produce "The quick brown fox jumped over the lazy dog."
#2 will produce "The "
#3 will produce "The  brown fox jumped  the lazy dog."

As it happens, the example he gave would result in the difference of only one line break between #2 and #3, but the way he specified the problem indicated it was in fact #3 that he wanted.

3.

Truncating output of a node

Jeni Tennison

Here's my approach - step through the nodes one by one, keeping track of the number of sentences that have been output so far (as Mike suggested). Be careful, though - text nodes might hold more than one sentence, so you need to have some recursion over the string in there as well.

So have one template for text nodes:

<xsl:template match="text()" name="text">
   <!-- $sentences holds the number of sentences covered so far -->
   <xsl:param name="sentences" select="0" />
   <!-- $text holds the text of the text node (or portion of a text
        node when this template is called recursively -->
   <xsl:param name="text" select="." />
   <!-- $next holds the next node to move on to -->
   <xsl:param name="next" select="following-sibling::*[1]" />
   <xsl:choose>
      <!-- if this text contains a sentence delimiter... -->
      <xsl:when test="contains($text, '. ')">
         <!-- ... finish the sentence ... -->
         <xsl:value-of select="substring-before($text, '. ')" />
         <xsl:text>. </xsl:text>
         <!-- ... and if we've output less than 2 sentences before
              (i.e. we've not just finished the third) ... -->
         <xsl:if test="$sentences &lt; 2">
            <!-- ... call the template on the rest of the string ...
                 -->
            <xsl:call-template name="text">
               <!-- ... adding one to the sentences count -->
               <xsl:with-param name="sentences" select="$sentences + 1" />
               <!-- ... passing the rest of the text node -->
               <xsl:with-param name="text"
                               select="substring-after($text, '. ')" />
               <!-- ... and keeping the same next node -->
               <xsl:with-param name="next" select="$next" />
            </xsl:call-template>
         </xsl:if>
      </xsl:when>
      <xsl:otherwise>
         <!-- ... otherwise just give the value of the string ... -->
         <xsl:value-of select="$text" />
         <!-- ... and apply templates to the following span element,
              keeping the number of sentences the same -->
         <xsl:apply-templates select="$next">
            <xsl:with-param name="sentences" select="$sentences" />
         </xsl:apply-templates>
      </xsl:otherwise>
   </xsl:choose>
</xsl:template>

Have another template for span elements, which can be a lot simpler, just basically applies templates to the next node, passing through the same count - this takes advantage of the fact that sentences never end within span elements:

<xsl:template match="span">
   <xsl:param name="sentences" select="0" />
   <xsl:copy-of select="." />
   <xsl:apply-templates select="following-sibling::node()[1]">
      <xsl:with-param name="sentences" select="$sentences" />
   </xsl:apply-templates>
</xsl:template>

Then you can start the process by just applying templates to the first node under the summary:

<xsl:template match="summary">
   <xsl:apply-templates select="node()[1]" />
</xsl:template>

4.

First child is a text node?

Mike Kay


   > I wrote the following predicate, which is to be true if the
   > first child is a
   > text node:
   >
   >  not(name(*[1]))
   >
   > It seems to work, but is this correct?
   

No, it's wrong. "*" will always select an element node, and an element node always has a name. not(name(node()[1])) would work, as would node()[1][not(self::*)]

5.

How to get the text of all children without whitespace

Jeni Tennison



> how can I get string which contains text nodes of all children of
> given element?

Usually, just getting the string value of a node will do that.

> for example what will return
>
> <a>
>   aaa
>   <b>
>     bbb
>     <c>
>       ccc
>     </c>
>     BBB
>   </b>
>   AAA
> </a>

It's very easy from this example to get the string:


"  aaa

    bbb

      ccc

    BBB

  AAA
"

Just do:

  <xsl:value-of select="/a" />

But to get the string that you want:

> aaabbbcccBBBAAA

you need to get rid of the whitespace that's been added to the tree. The easiest thing to do here is to have a template in 'string' mode that matches text nodes and gives the normalized value of the string:

<xsl:template match="text()" mode="string">
  <xsl:value-of select="normalize-space()" />
</xsl:template>

and then apply templates to the a element in 'string' mode; the built in templates will move through the tree to get to the text nodes, and their normalized values will be returned, concatenated together:

  <xsl:apply-templates select="/a" mode="string" />

Mike Brown adds:

The string-value of a node-set is the string-value of the node-set's node that comes first, in document order.

6.

Finding quote and apostrophe in text

Michael Kay


> How do I test in a choose if the xml node contains a ' &quot; 
> " OR &apos; in the text value.

<xsl:variable name="quot">"</xsl:variable>
<xsl:variable name="apos">'</xsl:variable>

<xsl:when test="contains(., $quot) or contains(., $apos)">

7.

Hightlight a substring

Mike Brown

Ednote. just about his last post to the list. Thanks Mike. I valued your input

> I'd like to match DEF in ABCDEFGHIJ...  then, I'd like to wrap some
>special HTML code around it "<strong>".

XPath provides the functions contains(), substring-before(), and substring-after() which you will find quite helpful. To highlight the first occurrence of DEF:

<xsl:variable name="substringToHighlight" select="'DEF'"/>
<xsl:if test="contains(.,$substringToHighlight)">
  <xsl:value-of select="substring-before(.,$substringToHighlight)"/>
  <strong>
    <xsl:value-of select="$substringToHighlight"/>
  </strong>
  <xsl:value-of select="substring-after(.,$substringToHighlight)"/>
</xsl:if>

If you want to highlight all occurrences of DEF, you can turn this into a recursive template that feeds the substring-after(...) part to its next invocation, rather than adding it to the result tree. If your XSLT processor detects tail recursion and optimizes for it, it should be safe and efficient (sadly, most processors don't).

> <xsl:template name="HighlightMatches">
>   <xsl:with-param name="c" select="current()"/>
>   <xsl:with-param name="match"/>

You would use xsl:param here, not xsl:with-param. Instead of current() you probably mean ".", but this will work better:

<xsl:template name="HightlightMatches">
  <xsl;param name="stringToSearchIn"/>
  <xsl:param name="substringToHighlight"/>
  <xsl:choose>
    <xsl:when test="contains($stringToSearchIn, $substringToHighlight">
      <xsl:value-of select="substring-before($stringToSearchIn,
$substringToHighlight)"/>
      <strong>
        <xsl:value-of select="$substringToHighlight"/>
      </strong>
      <xsl:call-template name="HighlightMatches">
        <xsl:with-param name="stringToSearchIn"
select="substring-after($stringToSearchIn, $substringToHighlight")"/>
        <xsl:with-param name="substringToHighlight"
select="$substringToHighlight"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="substring-after($stringToSearchIn,
$substringToHighlight)"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

then you can invoke it like

      <xsl:call-template name="HighlightMatches">
        <xsl:with-param name="stringToSearchIn" select="."/>
        <xsl:with-param name="substringToHighlight" select="'DEF'"/>
      </xsl:call-template>

all untested but should be very close if not 100% correct

8.

Selecting the first sentence of a paragraph

Dimitre Novatchev

This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

 <xsl:output omit-xml-declaration="yes"/>

  <xsl:template match="/">
    <xsl:apply-templates select="p" mode="fstSent"/>
  </xsl:template>

  <xsl:template match="@* | node()" mode="fstSent">
    <xsl:choose>
      <xsl:when test="not(self::p)
                     and
                      preceding::text()
                        [generate-id(ancestor::p[1])
                        =
                         generate-id(current()/ancestor::p[1])
                        ]
                        [contains(., '.')]"/>

      <xsl:when test="self::text()[contains(., '.')]">
        <xsl:value-of
        select="concat(substring-before(., '.'), '.')"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:copy>
          <xsl:apply-templates select="@* | node()"
                               mode="fstSent"/>
        </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

when applied on this source xml (a complicated variant of the one originally provided):

<p> Identical
  <u>and similar</u>
  <b>
    <i>to</i>
    <keyword>systat</keyword>.
   The optional argument
  </b>
  <arg> n</arg> specifies the level of detail.
</p>

produces the wanted result:

<p> Identical
  <u>and similar</u>
  <b>
    <i>to</i>
    <keyword>systat</keyword>.</b></p>

9.

Insert a character every nth character

Mike Kay / DaveP

E.g. insert a space every tenth character, or as per this example, insert X every 4th character. gpSz is the group size, s is the source string to be split.

Mike gave a 2.0 solution untested, which I show here after testing it.

  <xsl:variable name="s" select="'A long string with commas
	  inserted 
every 4th character'"/>
    <xsl:variable name="gpSz" select="4"/>
    
    <xsl:value-of select="  string-join(
  for $i in 0 to (string-length($s) idiv 4)
                          return substring($s, $i*$gpSz + 1, $gpSz), ',')"/>

Mukul Gandhi offers a 1.0 solution

Following is another solution to the problem -

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" encoding="UTF-8"
indent="yes"/>
	
<xsl:template match="/name">
  <xsl:call-template name="process-string">
    <xsl:with-param name="string-subset" select="."/>
    <xsl:with-param name="n" select="10"/>
  </xsl:call-template>
</xsl:template>
	
<xsl:template name="process-string">
  <xsl:param name="string-subset"/>
  <xsl:param name="n"/>
  
  <xsl:value-of select="substring($string-subset, 1, $n)"/>
  <xsl:text> </xsl:text>
  <xsl:variable name="strlen"
select="string-length(substring($string-subset, $n + 1,
string-length($string-subset)))"/>
  <xsl:if test="$strlen > 9">
     <xsl:call-template name="process-string">
	<xsl:with-param name="string-subset"
select="substring($string-subset, $n + 1, string-length($string-subset))"/>
	<xsl:with-param name="n" select="10"/>
     </xsl:call-template>
  </xsl:if>
  <xsl:if test="not($strlen > 9)">
     <xsl:value-of select="substring($string-subset, $n + 1,
string-length($string-subset))" />
  </xsl:if>
</xsl:template>
	
</xsl:stylesheet>

10.

Insert a character every nth character

Mike Kay /DaveP

E.g. insert a space every tenth character, or as per this example, insert X every 4th character. gpSz is the group size, s is the source string to be split.

  <xsl:variable name="s" select="'A long string, I want X 
inserted every 4th character'"/>
    <xsl:variable name="gpSz" select="4"/>
    
    <xsl:value-of select="  string-join(
  for $i in 0 to (string-length($s) idiv 4)
                          return substring($s, $i*$gpSz + 1, $gpSz), 'X')"/>

11.

Acronym and abbreviation substitution (OT?)

Matthew Smith



I have just written (for my own use), a small command-line Perl programme that can maintain a database of abbreviations and their definitions and substitute them into files. This may be helpful for anyone who has a lot of HTML without expanded abbreviations that needs correcting.

Two types of conversion may be applied, HTML and text. Due to much tiresome debate in the past and the fact that acronyms ARE abbreviations, as far as I am concerned (and this programme), ALL contractions are rendered as <abbr> in HTML mode. In text mode, the definition is given, followed by the abbreviation in parentheses.

Abbreviations are loaded from a text file containing abbreviation/definition pairs separated by a hash #.

If anyone thinks they would find this useful, you are more than welcome to use this code. Standard disclaimers apply...

The code may be picked up at: tivis.net I had to re-name from .pl to .txt otherwise Apache would execute the code rather than displaying it. I have made use of this quirk to provide an online help page (text/plain): abbr.pl

Notes:

1) I have yet to get around to doing much "user-proofing", such as raising errors when mutually exclusive arguments are given, but it's good enough for me.

2) This is NOT efficient code - with huge files and abbreviation lists, this could be slow on older machines.

3) Currently uses Berkeley DB - I may be persuaded to do a CGI/MySQL version for intranet/workgroup use.

Hope this can help someone

12.

Removing non-alphanumeric characters from attribute

Mukul Gandhi



> I was wondering if there is any way possible of stripping any 
> non-alphanumeric characters from an attribute. ie keep anything that 
> is A-Z/0-9 and strip all other characters like ",*-+. etc etc?

Please try the XSL -

<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
  
<xsl:variable name="str"
select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'" />
  
<xsl:template match="node()">
   <xsl:copy>       
      <xsl:for-each select="@*">
        <xsl:attribute name="{name()}">
          <xsl:variable name="att-val" select="." />
          <xsl:call-template name="stripchars">
             <xsl:with-param name="x"
select="substring($att-val, 1, 1)" /> 
             <xsl:with-param name="y"
select="substring(., 2, string-length($att-val))" /> 
          </xsl:call-template>
        </xsl:attribute>
      </xsl:for-each>
      <xsl:apply-templates />
   </xsl:copy>
</xsl:template>  
  
<xsl:template name="stripchars">
  <xsl:param name="x" />
  <xsl:param name="y" />
    
  <xsl:if test="contains($str, $x)">
    <xsl:value-of select="$x" />
  </xsl:if>
    
  <xsl:if test="string-length($y) > 0">
     <xsl:call-template name="stripchars">
       <xsl:with-param name="x" select="substring($y, 1, 1)" /> 
       <xsl:with-param name="y" select="substring($y, 2, string-length($y))"
/> 
     </xsl:call-template>
  </xsl:if>
</xsl:template>
 
</xsl:stylesheet>

for e.g. when it is applied to XML -

<?xml version="1.0"?>
<root>
  <a x="123ABC+-" />
  <b y="ABC12" />
  <c z="+-1" />
</root>

it produces output -

<?xml version="1.0"?>
<root>
  <a x="123ABC" />
  <b y="ABC12" />
  <c z="1" />
</root>

13.

How to find the longest node in a node-set

Phil Lanch


> Does anyone know a way I could define a variable that would
> contain the number of characters in the longest node in a
> node-set? Let the node set in question be
> file://DIV[@type='Chapter']: if I have three, with string
> lengths 88888, 99999, and 111110, I want my variable to be
> 111110.
 

<xsl:template name="getlongest">
  <xsl:param name="nodeset"/>
  <xsl:param name="longest" select="0"/>
  <xsl:choose>
    <xsl:when test="$nodeset">
      <xsl:choose>
        <xsl:when 
             test="string-length($nodeset[1]) > $longest">
          <xsl:call-template name="getlongest">
            <xsl:with-param name="nodeset" 
                    select="$nodeset[position()
> 1]"/>
            <xsl:with-param name="longest"
select="string-length($nodeset[1])"/>
          </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
          <xsl:call-template name="getlongest">
            <xsl:with-param 
               name="nodeset" select="$nodeset[position()
> 1]"/>
            <xsl:with-param 
     name="longest" select="$longest"/>
          </xsl:call-template>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$longest"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

            

14.

How to wrap lines at n characters?

Dimitre Novatchev


>Can I wrap plain text output at n characters?

Yes, use the str-split-to-lines template from FXSL.

Below, complete solution, with fxsl downloaded from sourceforge

extract str-foldl.xsl into same directory as stylesheet below.

Using Dimitre's example input text,

<text> Dec. 13 ? As always for a presidential inaugural, security and surveillance were extremely tight in Washington, DC, last January. But as George W. Bush prepared to take the oath of office, security planners installed an extra layer of protection: a prototype software system to detect a biological attack. The U.S. Department of Defense, together with regional health and emergency-planning agencies, distributed a special patient-query sheet to military clinics, civilian hospitals and even aid stations along the parade route and at the inaugural balls. Software quickly analyzed complaints of seven key symptoms ? from rashes to sore throats ? for patterns that might indicate the early stages of a bio-attack. There was a brief scare: the system noticed a surge in flulike symptoms at military clinics. Thankfully, tests confirmed it was just that ? the flu.</text>

The stylesheet outputs:

Dec. 13 ? As always for a presidential inaugural, security and surveillance were extremely tight in Washington, DC, last January. But as George W. Bush prepared to take the oath of office, security planners installed an extra layer of
protection: a prototype software system to detect a biological attack. The U.S. Department of Defense, together with regional health and emergency-planning agencies, distributed a special patient-query sheet to military clinics, civilian hospitals and even aid stations along the parade route and at the inaugural balls. Software quickly analyzed complaints of seven key symptoms ? from rashes to sore throats ? for patterns that might indicate the early stages of a bio-attack. There was a brief
scare: the system noticed a surge in flulike symptoms at military clinics. Thankfully, tests confirmed it was just that ? 
the 
<?xml version="1.0" ?>
<xsl:stylesheet version="1.1"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:str-split2lines-func="f:str-split2lines-func"
exclude-result-prefixes="xsl  str-split2lines-func"
>
   <xsl:import href="str-foldl.xsl"/>
<d:doc xmlns:d="rnib.org.uk/tbs#">
 <revhistory>
   <purpose><para>This stylesheet works with file XXXX.xml to produce YYYY.html</para></purpose>
   <revision>
    <revnumber>1.0</revnumber>
    <date> 2004</date>
    <authorinitials>DaveP</authorinitials>
    <revdescription>
     <para></para>
    </revdescription>
    <revremark></revremark>
   </revision>
  </revhistory>
  </d:doc>



   <str-split2lines-func:str-split2lines-func/>

   <xsl:output indent="yes" omit-xml-declaration="yes"/>
   
  <xsl:template match="/">
      <xsl:call-template name="str-split-to-lines">
        <xsl:with-param name="pStr" select="/*"/>
        <xsl:with-param name="pLineLength" select="64"/>
        <xsl:with-param name="pDelimiters" select="' 	

'"/>
      </xsl:call-template>
    </xsl:template>



    <xsl:template name="str-split-to-lines">
      <xsl:param name="pStr"/>
      <xsl:param name="pLineLength" select="60"/>
      <xsl:param name="pDelimiters" select="' 	

'"/>
      
      <xsl:variable name="vsplit2linesFun"
                    select="document('')/*/str-split2lines-func:*[1]"/>
                    
      <xsl:variable name="vrtfParams">
       <delimiters><xsl:value-of select="$pDelimiters"/></delimiters>
       <lineLength><xsl:copy-of select="$pLineLength"/></lineLength>
      </xsl:variable>

      <xsl:variable name="vResult">
	      <xsl:call-template name="str-foldl">
	        <xsl:with-param name="pFunc" select="$vsplit2linesFun"/>
	        <xsl:with-param name="pStr" select="$pStr"/>
	        <xsl:with-param name="pA0" select="$vrtfParams"/>
	      </xsl:call-template>
      </xsl:variable>
      
      <xsl:for-each select="$vResult/line">
        <xsl:for-each select="word">
          <xsl:value-of select="concat(., ' ')"/>
        </xsl:for-each>
        <xsl:value-of select="'
'"/>
      </xsl:for-each>
    </xsl:template>

    <xsl:template match="str-split2lines-func:*">
      <xsl:param name="arg1" select="/.."/>
      <xsl:param name="arg2"/>
         
      <xsl:copy-of select="$arg1/*[position() < 3]"/>
      <xsl:copy-of select="$arg1/line[position() != last()]"/>
      
	  <xsl:choose>
	    <xsl:when test="contains($arg1/*[1], $arg2)">
	      <xsl:if test="string($arg1/word)">
	         <xsl:call-template name="fillLine">
	           <xsl:with-param name="pLine" 
                   select="$arg1/line[last()]"/>
	           <xsl:with-param name="pWord" select="$arg1/word"/>
	           <xsl:with-param name="pLineLength" select="$arg1/*[2]"/>
	         </xsl:call-template>
	      </xsl:if>
	    </xsl:when>
	    <xsl:otherwise>
	      <xsl:copy-of select="$arg1/line[last()]"/>
	      <word><xsl:value-of select=
         "concat($arg1/word, $arg2)"/></word>
	    </xsl:otherwise>
	  </xsl:choose>
	</xsl:template>
      
      <!-- Test if the new word fits into the last line -->
	<xsl:template name="fillLine">
      <xsl:param name="pLine" select="/.."/>
      <xsl:param name="pWord" select="/.."/>
      <xsl:param name="pLineLength" />
      
      <xsl:variable name="vnWordsInLine" select="count($pLine/word)"/>
      <xsl:variable name="vLineLength" select="string-length($pLine) 
                                             + $vnWordsInLine"/>
      <xsl:choose>
        <xsl:when test="not($vLineLength + 
           string-length($pWord) > $pLineLength)">
          <line>
            <xsl:copy-of select="$pLine/*"/>
            <xsl:copy-of select="$pWord"/>
          </line>
        </xsl:when>
        <xsl:otherwise>
          <xsl:copy-of select="$pLine"/>
          <line>
            <xsl:copy-of select="$pWord"/>
          </line>
          <word/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:template>

</xsl:stylesheet>

15.

Matching text nodes and whitespace

Michael Kay



> I need to write a template matching text nodes, which just consit of 
> multiple whitespaces (tabs in most cases).
> I cant't use normalize-space since I need leading and tailing 
> whitespaces in some cases.
> I suppose matches() would help me here, but I actually don't know how 
> to formulate the regular expression for that.

If you want to match a text node that consists entirely of whitespace you can use

xsl:template match="text()[normalize-space()='']"

The value of "." inside the template rule will still be the original text node; testing its value using normalize-space does not modify the contents of the node.

16.

Selecting last text() from fragment of unknown depth?

Michael Kay



> So far all my recursion attempts have not worked


> It's meant to take an XHTML string and replace the end of 
> the last text() -- after the last occuring space character -- with an 
> ellipsis, which I've now heard  is 'impossible' in XSLT 1.0




> With this template

> <xsl:template
> match="submission.excerpt//node()/@*|submission.excerpt//node()">
> <xsl:copy><xsl:apply-templates select="./@*|./node()" /></xsl:copy> 
> </xsl:template>

> have I made it impossible to also transform (inline) the last 
> occurring textnode beneath submission.excerpt/node()?

Nothing is impossible, but selecting whether you're the last text node in a subtree is a little tricky in 1.0. I often find it useful to work back from a 2.0 solution, which you could do by adding the template rule:

<xsl:template match="text()[. is
(ancestor::submission.excerpt//text())[last()]]" priority="5"
  <xsl:value-of select="."/>...
</xsl:template>

You haven't got the "is" operator in 1.0, but you can replace (A is B) by (generate-id(A) = generate-id(B)).

This solution could be rather expensive if the number of text nodes in a submission.excerpt is large.

17.

Selecting text nodes

Michael Kay



> The problem I seem to be having with this is that text() for instance 
> the text "Chapter on Testing:" can not be considered a 
> preceding-sibling of <AddText>Testing FAQ</AddText> even though they 
> have the same <Center> parent tag since preceding-sibling does not 
> grab the text node.

In your example the text node "Chapter on Testing:" is indeed a preceding-sibling of the element node <AddText>Testing FAQ</AddText>, so the problem must be somewhere else.

Note that preceding-sibling::* only selects elements, to select text nodes use preceding-sibling::text(), and to select both using preceding-sibling::node(). You probably only want the immediately preceding sibling, which is preceding-sibling::node()[1].

However, if you select a text node then its string-length will always be >0. Zero-length text nodes do not exist. Perhaps you really want to test for the existence of the text node rather than its length? Fortunately, though, string-length() applied to an empty node-set returns 0.

18.

Remove non numeric content

Ken Holman


> I need to strip out non-numerical values from a string.
> Here is a sample input value:  TUV0062
> And what I want is :  0062  (or just 62)
> What is the correct way to do this using XSLT 2?

neff.xsl

<?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                 version="1.0">

<xsl:output method="text"/>

<xsl:template match="/">
    <xsl:variable name="target" select="'TUV0062'"/>

    <xsl:value-of select="number(
                           translate($target,
                            translate($target,'0123456789',''),''))"/>
</xsl:template>

</xsl:stylesheet>

provides output of

62

19.

Search for a word in the entire input file

Michael Kay



>I am trying to look if the word blah exists in the entire document.
contains( /, "blah" )


>Is there any difference between
>//*[contains(text(), "blah")]
>and
>//*[contains( . , "blah")]

//*[contains(text(), "blah")] 

selects all elements whose first child text node of the element contains "blah".

//*[contains( . , "blah")]

selects all elements whose string value contains "blah".

Consider the following elements:

<a>blah</a>
<b><x>blah</x></b>
<c>bl<i>ah</i></c>
<d>bl<!--c-->ah</d>
<e>foo <x/> blah</e>
<f>blah <x/> foo</f>

The first expression selects only <a> and <f>. The second expression selects all six of these elements.

Generally, it's best to work with the string value (or in 2.0, the typed value) of an element, rather than working directly with its text nodes.

20.

Access to text content

Andrew Welch


> How do I extract the value of an element, without text in its children. E.g.
> how do I extract from the next example the line 'This text is some test .'.
>
> <text>This text is some test <emph>text</emph>.</text>
>
> I think I am overlooking a simple function, but can't seem to find it.

The crucial thing to grasp here is that <text> contains the text node "This text is some test ", the element <emph> and then another text node "."

When you do value-of on <text> you get the text nodes of the element and all its children concatenated together, so you would get the text node child of <emph> - which isnt what you want.

When you do value-of select="text()" on <text> to only get the text node children you get a sequence of two items - the two text nodes. In 1.0 when you have a list of more than one item the first is used and the rest discarded (to allow functions that expect a single item to not fail if a list is supplied - XSLT 1.0 was designed to always produce some output).

What all this means is that in 1.0 you can do either:

<xsl:for-each select="text()">
 <xsl:value-of select="."/>
</xsl:for-each>

or

<xsl:apply-templates select="text()"/>

Either technique will process each text node in turn.

In 2.0 all the items in the sequence get concatenated together using a separator (if one's not supplied the default of a single space is used), so a simple value-of select="text()" does the job.

Wendell Piez offers

<text>This text is some test <emph>text</emph>.</text>

is parsed into a model that looks like this:

element 'text'
  text node (value: 'This text is some test ')
  element 'emph'
    text node (value: 'text')
  text node (value: '.')

These XML document structures are traversed and queried in XSLT using not functions, but XPath expressions (XPath includes a function library, but also more).

In this case, given the 'text' element as context, the XPath expression "text()", short for "child::text()", will return both the text node children of the element (its first and third children). If you want the value of only the first of these, you can write "text()[1]". In XPath 1.0, simply "text()" will often get you that result, but since the rules have changed in XPath 2.0 it's perhaps best to learn not to rely on that.

How you use that depends on why you need the expression, which wasn't given in the question.

21.

Removing characters from some elements

David Carlisle



>What I'm trying to do is process some xml of which this is a fragment:

> <aff id="aj266199af2">
>   <label>2</label>National Radio Astronomy Observatory, 
> Chile<xref ref-type="fn" 
> rid="aj266199afn4"><sup>9</sup></xref>; <ext-link 
> ext-link-type="email" id="aj266199em4">email address 
> here</ext-link> </aff>

> <!-- snip -->
> <fn id="aj266199afn4">
>   <label>9</label>
>   <p>The National Radio Astronomy Observatory is a facility 
> of the National Science Foundation operated under cooperative 
> agreement by Associated Universities, Inc.
>   </p>
> </fn>
> The problem is the trailing semi-colon between the <xref> & 
> <ext-link> elements.

If you only want to lose punctuation that was by one of those elements, just do so.

<xsl:template match="text()[preceding-sibling::*[1][self::ext-link]]">
  <xsl:value-of select="replace(.,'[,;]\s*$','')"/>
</xsl:template>

<xsl:template match="text()[preceding-sibling::*[1][self::ext-link]]">
  <xsl:value-of select="replace(.,'^\s*[,;]','')"/>
</xsl:template>

which, If I got that right, just removes ' and ; (and following white space) from the end of a text node that's followed by ext-link and similarly zaps punctuation from the start of text nodes that folloe ext-link.