Replacing text in xslt

Replace

1. Replace new line character with element
2. String replacement
3. Replace string
4. Removing smart quotes
5. Search and replace acronyms
6. Using keys to lookup an acronym from current stylesheet
7. ? OT ? replace acronyms | abbreviations with marked up HTML
8. Check for non-ASCII characters & remove
9. Adding markup from a lookup file
10. Replace escaped items
11. Removing newlines and apostrophe
12. Replace, Using translate() for single quotes in XSL1.0

1.

Replace new line character with element

Mike Kay

> Is it possible using xsl  to go through a document and replace
> linebreaks with a tag (like <br/>)

For XSLT the answer is along the lines of:

<xsl:template match="text()">
   <xsl:call-template name="break">
</xsl:template>

<xsl:template name="break">
   <xsl:param name="text" select="."/>
   <xsl:choose>
   <xsl:when test="contains($text, '&#xa;')">
      <xsl:value-of select="substring-before($text, '&#xa;')"/>
      <br/>
      <xsl:call-template name="break">
          <xsl:with-param name="text" select="substring-after($text,
'&#xa;')"/>
      </xsl:call-template>
   </xsl:when>
   <xsl:otherwise>
	<xsl:value-of select="$text"/>
   </xsl:otherwise>
   </xsl:choose>
</xsl:template>

2.

String replacement

Mike Brown

<!-- pretend this is in a template -->
  <xsl:variable name="myString" select="'This%20is%20Test'"/>
  <xsl:variable name="myNewString">
    <xsl:call-template name="replaceCharsInString">
      <xsl:with-param name="stringIn" select="string($myString)"/>
      <xsl:with-param name="charsIn" select="'%20'"/>
      <xsl:with-param name="charsOut" select="' '"/>
    </xsl:call-template>
  </xsl:variable>
  <!-- $myNewString is a result tree fragment, which should be OK. -->
  <!-- If you really need a string object, do this: -->
  <xsl:variable name="myNewRealString" select="string($myNewString)"/>

<!-- here is the template that does the replacement -->
<xsl:template name="replaceCharsInString">
  <xsl:param name="stringIn"/>
  <xsl:param name="charsIn"/>
  <xsl:param name="charsOut"/>
  <xsl:choose>
    <xsl:when test="contains($stringIn,$charsIn)">
      <xsl:value-of select="concat(substring-before($stringIn,$charsIn),$charsOut)"/>
      <xsl:call-template name="replaceCharsInString">
        <xsl:with-param name="stringIn" select="substring-after($stringIn,$charsIn)"/>
        <xsl:with-param name="charsIn" select="$charsIn"/>
        <xsl:with-param name="charsOut" select="$charsOut"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$stringIn"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>


    

3.

Replace string

Evan Lenz

to replace one specified string with another specified string, within a specified string. For example, replace-string("Hello there", "Hello", "Hi") would return "Hi there". I use the following stylesheet (adapted from an example in Kay's book) to do this for me now.

<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template name="replace-string">
    <xsl:param name="text"/>
    <xsl:param name="replace"/>
    <xsl:param name="with"/>
    <xsl:choose>
      <xsl:when test="contains($text,$replace)">
        <xsl:value-of select="substring-before($text,$replace)"/>
        <xsl:value-of select="$with"/>
        <xsl:call-template name="replace-string">
          <xsl:with-param name="text"
select="substring-after($text,$replace)"/>
          <xsl:with-param name="replace" select="$replace"/>
          <xsl:with-param name="with" select="$with"/>
        </xsl:call-template>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="$text"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

Here's an example of where I need to escape curly braces in attribute values in a generated stylesheet (admittedly not the most frequent use-case, but a simple requirement nevertheless):

  <!-- brace-escaping to guard against interpretation as AVTs in the
output -->
  <xsl:template match="@*" mode="rootRule">
    <xsl:variable name="leftBraceReplaced">
      <xsl:call-template name="replace-string"> <!-- imported template -->
        <xsl:with-param name="text" select="."/>
        <xsl:with-param name="replace" select="'{'"/>
        <xsl:with-param name="with" select="'{{'"/>
      </xsl:call-template>
    </xsl:variable>
    <xsl:variable name="bothBracesReplaced">
      <xsl:call-template name="replace-string">
        <xsl:with-param name="text" select="$leftBraceReplaced"/>
        <xsl:with-param name="replace" select="'}'"/>
        <xsl:with-param name="with" select="'}}'"/>
      </xsl:call-template>
    </xsl:variable>
    <xsl:attribute name="{name()}">
      <xsl:value-of select="$bothBracesReplaced"/>
    </xsl:attribute>
  </xsl:template>

4.

Removing smart quotes

David Carlisle


> What is the best way to get rid of curly quotes/smart quotes in XML to
> XML transformation. I have no control over incoming XML document.

replace by nothing or replace by straight quotes?

translate(.,'&#201C;&#201D;','&quot;&quot;')

will replace the left and right " by '

translate(.,'&#201C;&#201D;','')

will remove them

5.

Search and replace acronyms

Jeni Tennison



> I have an XML journal source, and an XML list of acronyms. I wish to
>automatically replace any occurrence of an acronym within the XML source,
>with the appropriate <acronym title="blah">acronym</acronym> tag. It's easy
>to replace one acronym, using simple XSL recursive find/replace, but when I
>try to do more than one, I hit multiple difficulties. 

OK. First off, you need to design the parameters for the template. The things that matter are the list of acronyms that you want to be replaced and the text in which you want to replace them:

<xsl:template name="replace-acronyms">
  <xsl:param name="acronyms"
    select="document('../xml/acronyms.xml')/acronyms/acronym" />
  <xsl:param name="text" />
  ...
</xsl:template>

The first tests to make are the stopping conditions: if $text is empty, then the template shouldn't generate anything; if $acronyms is empty, then the template should just return $text:

  <xsl:choose>
    <xsl:when test="not($acronyms)">
      <xsl:value-of select="$text" />
    </xsl:when>
    <xsl:when test="not(string($text))" />
    <xsl:otherwise>
      ...
    </xsl:otherwise>
  </xsl:choose>

Now we've confirmed that we actually have some text to process and some acronyms to replace within it, we'll set about our first task: to replace the first occurrence in $text of the first acronym in $acronyms. We'll store the first acronym that we want to find in $acronyms in a variable called $acronym:

  <xsl:variable name="acronym" select="$acronyms[1]/@acronym" />

Note that I'm assuming the <acronym> elements are of the form:

  <acronym acronym="XML">Extensible Markup Language</acronym>

What we do then depends on whether $acronym appears in $text or not:

  <xsl:choose>
    <xsl:when test="contains($text, $acronym)">
      ...
    </xsl:when>
    <xsl:otherwise>
      ...
    </xsl:otherwise>
  </xsl:choose>

If $acronym *doesn't* appear in $text, then we want to call the template again on the unadjusted text, with $acronyms this time set to the *rest* of the acronyms (all but the first):

  <xsl:otherwise>
    <xsl:call-template name="replace-acronyms">
      <xsl:with-param name="text" select="$text" />
      <xsl:with-param name="acronyms"
                      select="$acronyms[position() >  1]" />
    </xsl:call-template>
  </xsl:otherwise>

If $acronym *does* appear in text, then we need to break the text into two parts: the part before $acronym and the part after $acronym:

  <xsl:variable name="before"
                select="substring-before($text, $acronym)" />
  <xsl:variable name="after"
                select="substring-after($text, $acronym)" />

Now, we know that $acronym doesn't appear in $before (because $before is, by definition, the text before the first occurrence of $acronym), but $before might contain other acronyms. So we need to call the template on $before with the 'rest' of the acronyms:

  <xsl:call-template name="replace-acronyms">
    <xsl:with-param name="text" select="$before" />
    <xsl:with-param name="acronyms"
                    select="$acronyms[position() >  1]" />
  </xsl:call-template>

Then we need to generate the <acronym> element. The title attribute needs to hold the value of the first <acronym> element in $acronyms, and the value of the <acronym> element is the acronym $acronym itself:

  <acronym title="{$acronyms[1]}">
    <xsl:value-of select="$acronym" />
  </acronym>

Then we need to do something with $after. Now, $after could contain $acronym again, so the recursive call needs to pass *all* the $acronyms through to the text call:

  <xsl:call-template name="replace-acronyms">
    <xsl:with-param name="text" select="$after" />
    <xsl:with-param name="acronyms" select="$acronyms" />
  </xsl:call-template>

And there we have it. The complete template looks like:

<xsl:template name="replace-acronyms">
  <xsl:param name="acronyms"
    select="document('../xml/acronyms.xml')/acronyms/acronym" />
  <xsl:param name="text" />
  <xsl:choose>
    <xsl:when test="not($acronyms)">
      <xsl:value-of select="$text" />
    </xsl:when>
    <xsl:when test="not(string($text))" />
    <xsl:otherwise>
      <xsl:variable name="acronym" select="$acronyms[1]/@acronym" />
      <xsl:choose>
        <xsl:when test="contains($text, $acronym)">
          <xsl:variable name="before"
                        select="substring-before($text, $acronym)" />
          <xsl:variable name="after"
                        select="substring-after($text, $acronym)" />
          <xsl:call-template name="replace-acronyms">
            <xsl:with-param name="text" select="$before" />
            <xsl:with-param name="acronyms"
                            select="$acronyms[position() >  1]" />
          </xsl:call-template>
          <acronym title="{$acronyms[1]}">
            <xsl:value-of select="$acronym" />
          </acronym>
          <xsl:call-template name="replace-acronyms">
            <xsl:with-param name="text" select="$after" />
            <xsl:with-param name="acronyms" select="$acronyms" />
          </xsl:call-template>
        </xsl:when>
        <xsl:otherwise>
          <xsl:call-template name="replace-acronyms">
            <xsl:with-param name="text" select="$text" />
            <xsl:with-param name="acronyms"
                            select="$acronyms[position() >  1]" />
          </xsl:call-template>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

So, the key points here are:

1. work through the acronyms using recursion rather than iteration 2. recurse on the before and after portions of the text 3. treat generated XML as XML rather than as a string (so don't use disable-output-escaping to create it)

To complete this email, I'll just mention that in XSLT 2.0, you can use <xsl:analyze-string> to do this. Something along the lines of:

  <xsl:variable name="acronyms" as="element(acronym)+"
    select="document('../xml/acronyms.xml')/acronyms/acronym" />

  <xsl:variable name="acronym-regex" as="xs:string"
    select="string-join($acronyms/@acronym, '|')" />

  <xsl:analyze-string select="$text" regex="{$acronym-regex}">
    <xsl:matching-substring>
      <xsl:variable name="acronym" as="xs:string" select="." />
      <acronym title="{$acronyms[@acronym = $acronym]}">
        <xsl:value-of select="$acronym" />
      </acronym>
    </xsl:matching-substring>
    <xsl:non-matching-substring>
      <xsl:value-of select="." />
    </xsl:non-matching-substring>
  </xsl:analyze-string>

6.

Using keys to lookup an acronym from current stylesheet

Justin Makeig


<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:a="http://cde.berkeley.edu/docbook/constant/acronym"
    exclude-result-prefixes="a">
    
    <xsl:key name="AcronymKey" match="a:acronymItem" use="a:acronym"/>
    
    <xsl:template name="AcronymnStandsFor">
        <xsl:param name="acronym"/>
        <!-- change context to current document so the key will work -->
        <xsl:for-each select="document('')">
            <xsl:value-of select="key('AcronymKey',$acronym)/a:standsFor"/>
        </xsl:for-each>
    </xsl:template>
    
    <!-- acronym lookups -->
    <a:acronymList>
        <a:acronymItem>
            <a:acronym>Ant</a:acronym>
            <a:standsFor>Another Neat Tool</a:standsFor>
        </a:acronymItem>
        ...
    </a:acronymList>
</xsl:stylesheet>

7.

? OT ? replace acronyms | abbreviations with marked up HTML

Matthew Smith

(Nearly on topic, and potentially very useful, DaveP)

I have just written (for my own use), a small command-line Perl programme that can maintain a database of abbreviations and their definitions and substitute them into files. This may be helpful for anyone who has a lot of HTML without expanded abbreviations that needs correcting.

Two types of conversion may be applied, HTML and text. Due to much tiresome debate in the past and the fact that acronyms ARE abbreviations, as far as I am concerned (and this programme), ALL contractions are rendered as <abbr> in HTML mode. In text mode, the definition is given, followed by the abbreviation in parentheses.

Abbreviations are loaded from a text file containing abbreviation/definition pairs separated by a hash #.

If anyone thinks they would find this useful, you are more than welcome to use this code. Standard disclaimers apply...

The code may be picked up at: <http://www.tivis.net.au/tools/abbr.txt>

I had to re-name from .pl to .txt otherwise Apache would execute the code rather than displaying it. I have made use of this quirk to provide an online help page (text/plain): <http://www.tivis.net.au/tools/abbr.pl>

Notes:

1) I have yet to get around to doing much "user-proofing", such as raising errors when mutually exclusive arguments are given, but it's good enough for me.

2) This is NOT efficient code - with huge files and abbreviation lists, this could be slow on older machines.

3) Currently uses Berkeley DB - I may be persuaded to do a CGI/MySQL version for intranet/workgroup use.

Hope this can help someone - any queries, let me know.

8.

Check for non-ASCII characters & remove

Michael Müller-Hillebrand



>I have to check if I have some of the extended ASCII (UTF8) characters in
>the input XML file.
>So, I am using stylesheet for transformation XML input file to the
>different format of the XML file.
>Can I do something efficient to check for special characters in some of the
>fields in the input XML file and to convert those specific characters, (for
>example, è to e) in the output file?

To check if you have non-ASCII characters you can remove all ASCII chars from a string and check whether the result is empty. First create a string variable which contains all valid characters:

<xsl:variable
name="ascii-chars">&#32;!...01234568789..ABCD...abcd...</xsl:variable>

For the test you would use the translate() function:

<xsl:if test="translate(., $ascii-chars, '') = ''"> ...
</xsl:if>

To change non-ASCII chars to ASCII chars you have to know which characters to change. This is easy if you know the input is ISO-8859-1 or some other limited range of characters. You would use the translate() function.

If any valid UTF-8 char could be in the input, there is no easy solution, because in many languages single characters can only be transliterated to more than one ASCII character.

9.

Adding markup from a lookup file

David Carlisle



Given this highly contrived paragraph of text:

<p>words apple words juice, orange words apple</p>

...and a lookup file:

<word>apple</word>
<word>juice, orange</word>

I need to wrap the first occurance of each item in the lookup file with an anchor:

<p>words <a>apple</a> words <a>juice, orange</a>
    words apple
</p>

The space in the keyword makes this a hard problem... any cunning solutions?

<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

<xsl:template match="p">
 <p>
   <xsl:apply-templates select="doc('wordlist.xml')/list/word[1]">
     <xsl:with-param name="p" select="."/>
   </xsl:apply-templates>
 </p>
</xsl:template>

<xsl:template match="word">
 <xsl:param name="p"/>
 <xsl:choose>
   <xsl:when test="following-sibling::word">
     <xsl:apply-templates select="following-sibling::word[1]">
   <xsl:with-param name="p" select="replace($p,.,concat('[',.,']'))"/>
     </xsl:apply-templates>
   </xsl:when>
   <xsl:otherwise>
     <xsl:variable name="x">
   <xsl:analyze-string select="replace($p,.,concat('[',.,']'))"
regex="\[(.*?)\]">
     <xsl:matching-substring>
       <a><xsl:value-of select="regex-group(1)"/></a>
     </xsl:matching-substring>
     <xsl:non-matching-substring>
       <xsl:value-of select="."/>
     </xsl:non-matching-substring>
   </xsl:analyze-string>
     </xsl:variable>
     <xsl:apply-templates select="$x/node()"/>
   </xsl:otherwise>
 </xsl:choose>
</xsl:template>

<xsl:template match="a[not(preceding-sibling::a=.)]">
 <xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

10.

Replace escaped items

Jeni Tennison




Input content
<?xml version="1.0" encoding="UTF-8"?>
<broadcast>
  <content_vars>
   <content name="subject"><html>Hello [[BUYERS_NAME]] <p>REF Order
[WEB_ORDER_NUMBER]</p></html></ content><!--encoded-->
  </content_vars>

        <ORDER_FEED>
<ORDER>
<ORDER_HEADER>
<BUYERS_NAME>Senthil</BUYERS_NAME>
<WEB_ORDER_NUMBER>W12345<WEB_ORDER_NUMBER>
</ORDER_HEADER>
<!--Line Items-->
</ORDER>
</ORDER_FEED>
</broadcast>

Desired output
<?xml version="1.0" encoding="UTF-8"?>
 <content name="subject">Hello Senthil<p> REF Order  W12345 </p></
html></ content>




> I have difficulty in recursive replacement of escapes.

Thanks for showing your sample XML and desired output so clearly. One thing confuses me, though: your sample XML contains element names that are escaped with both double square brackets, such as [[BUYERS_NAME]] and single ones, such as [WEB_ORDER_NUMBER]. I'm going to assume you just want to use single square brackets.

First, I'd set up a key that holds the elements that hold the values you want to insert, such as <BUYERS_NAME>.

<xsl:variable name="replacements"
              select="/broadcast/ORDER_FEED/ORDER/ORDER_HEADER/*" />

Then I'd write a recursive template that takes a string, looks for the first [, and outputs the string up to the first [, then the replacement value for the substring between the first [ and the next ], and finally calls itself on the substring after the ]:

<xsl:template name="replace-escapes">
  <xsl:param name="string" />
  <xsl:choose>
    <xsl:when test="contains($string, '[')">
      <!-- substring before the escaped sequence -->
      <xsl:value-of select="substring-before($string, '[')" />

      <!-- replacement for the string between the []s -->
      <xsl:variable name="ename"
        select="substring-before(substring-after($string, '['), ']')" />
      <xsl:value-of select="$replacements[name() = $ename]" />

      <!-- recursive call on the rest of the string -->
      <xsl:call-template name="replace-escapes">
        <xsl:with-param name="string"
          select="substring-after(substring-after($string, '['), ']')"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$string" />
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

I'd call this template from a template matching text nodes in the <content> of your XML. I'd use a 'copy' mode to copy all the elements in this <content> while escaping the text. The templates would look something like:

<xsl:template match="broadcast">
  <xsl:apply-templates select="content_vars/content/*"
                       mode="copy" />
</xsl:template>

<xsl:template match="*" mode="copy">
  <xsl:copy>
    <xsl:copy-of select="@*" />
    <xsl:apply-templates mode="copy" />
  </xsl:copy>
</xsl:template>

<xsl:template match="text()" mode="copy">
  <xsl:call-template name="replace-escapes">
    <xsl:with-param name="string" select="." />
  </xsl:call-template>
</xsl:template>

You could use keys instead of a global variable holding the replacement values, but I hope this gets you on the right track.

11.

Removing newlines and apostrophe

David Carlisle





I'm trying to get rid of the ' and the line feeds in my string but can't
get it to work.

That's what I have so far:

<xsl:value-of select="
translate(translate(/node,'&#xA;',''),'&apos;', '&apos;')"/>

The first translation of line feeds works just fine, but I can't get rid
of the apostrophe. The parser gives me an error because of the &apos. I
tried different stuff but couldn't get the proper solution.

after it's expanded by the XML parser,


<xsl:value-of select="
translate(translate(/node,'&#xA;',''),'&apos;', '&apos;')"/>

means that the XSLT engine has to evalate the XPath

translate(translate(/node,'',''),''', ''')

and ''' is a syntax error.

The intention is to turn "'" into "" (I think) so you want the XPath

translate(translate(/node,'',''),"'", '')

which you then need to quote using xml entity references in order to put in an XML attribute

<xsl:value-of
select="translate(translate(/node,'&#xA;',''),&quot;'&quot;, '')"/>

although you can remove both characters at once with

<xsl:value-of
select="translate(/node,&quot;&#xA;'&quot;,'')"/>

12.

Replace, Using translate() for single quotes in XSL1.0

David Carlisle


>  Following the solution in the
> faqs i thought this would work:
> <xsl:value-of select="replace(.,'&#34;','&#39;')"/>

No, after it has been through the xml parser that is the XPath expression

replace(.,'"',''')

and that last term is a syntax error, you need the XPath expression

replace(.,'"',"'")

which may be put into an XML attribute like so:


<xsl:value-of select="replace(.,'&#34;',&quot;'&quot;)"/>