XSLT, text filter

Filters

1. Extracting file name
2. How to eliminate duplicates
3. Apply-templates question
4. How can I filter out certain words

1.

Extracting file name

David Carlisle


> I have need to extract the "file name" from a string returned by
> unparsed-entity-uri. Obviously, the path is dependent on 
> where the file sits so it can be of arbitrary length. 
>The string I am dealing  with is something
> like file:/C:/foo/bar/somefile.gif where I need the 
> somefile.gif string. 



<xsl:call-template name="filename">
  <xsl:with-param name="x" select="$path"/>
</xsl:call-template>
    

$path is the expression with your path, and filename template looks something like this:

<xsl:template name="filename">
<xsl:param name="x"/>
<xsl:choose>
<xsl:when test="contains($x,'/')">
  <xsl:call-template name="filename">
    <xsl:with-param name="x" select="substring-after($x,'/')"/>
  </xsl:call-template>
</xsl:when>
<xsl:otherwise>
  <xsl:value-of select="$x"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>

            

2.

How to eliminate duplicates

Phil Lanch

My question is really how to eliminate duplicates, counting
    <handle>FOO</handle>
and
   <handle>foo</handle>
as duplicates.

<xsl:variable name="up" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="lo" select="'abcdefghijklmnopqrstuvwxyz'"/>

<xsl:template match="//handle" name="handle">
  <xsl:param name="i" select="1"/>
  <xsl:choose>
    <xsl:when test="position() + $i > last()">
      <handle><xsl:value-of select="translate(.,$up,$lo)"/></handle>
    </xsl:when>
    <xsl:when test="not( translate(.,$up,$lo) =
translate(following::handle[$i],$up,$lo) )">
      <xsl:call-template name="handle">
        <xsl:with-param name="i" select="$i + 1"/>
      </xsl:call-template>
    </xsl:when>
  </xsl:choose>
</xsl:template>

David Carlisle offered this improvement.

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0"
                >


<xsl:variable name="up" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'"/>
<xsl:variable name="lo" select="'abcdefghijklmnopqrstuvwxyz'"/>

<xsl:template match="/">
<xsl:apply-templates select="//handle"/>
</xsl:template>

<xsl:template match="handle">
<xsl:if test="not(following::handle[translate(.,$up,$lo)=
                                    translate(current(),$up,$lo)])">
  <xsl:copy-of select="."/>
</xsl:if>
</xsl:template>

</xsl:stylesheet>

To do this directly in xpath you would need a generalisation of current() that gave the node current outside the current filter rather than the node current at the start of the expression.

Of course the above probably results in the current node being downcased multiple times and it would be better anyway not to use current() at all and just put the lowercase of the current node value into a variable.

3.

Apply-templates question

David Carlisle


Is it possible, when you apply-templates, to decide that it should 
apply from position()=3 and onwards for the siblings.

XML:
<section>
 <item>I am item number 1</item>
 <item>I am item number 2</item>
 <item>I am item number 3</item>
 <item>I am item number 4</item>
</section>

><xsl:template match="section">
>   <xsl:apply-templates 
	select="following-sibling::item[position() = 3]"/>

you are sat at a section node, and you have asked to go from there to the third item node that is a sibling of the current node. That isn't what you want as the item nodes are children of section, and you want all items after position 2, not just the third.

<xsl:template match="section">
 <xsl:apply-templates select="item[position() > 2]"/>

            

4.

How can I filter out certain words

Jeni Tennison

For example: with input

<xmlfile>
<book>
<title>The quick brown</title>
  </book>
<book>
<title>A little knowledge is a dangerous thing</title>
  </book>
<book>
<title>Is this the real thing</title>
  </book>
 </xmlfile>

How to get output like

 <result>
   <before>The quick brown</before>
   <after>quick brown</after>
   <before>A little knowledge is a dangerous thing</before>
   <after>little knowledge is a dangerous thing</after>
   <before>Is this the real thing</before>
   <after>this the real thing</after>
 </result>

Adapting Eric's solution:

The xsl:stylesheet element declares the necessaries, and the additional namespace 'sw' that is used for the internal data (the list of stop words). To prevent this namespace being declared on your output, use 'exclude-result-prefixes':

<xsl:stylesheet version="1.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:sw="mailto:vdv@dyomedea.com"
                exclude-result-prefixes="sw">
  ...
</xsl:stylesheet>

Then the declaration of the stop words that you want to filter out. I've put these in a variable so that they can be accessed easily:

<sw:stop>
  <word>the</word>
  <word>a</word>
  <word>is</word>
</sw:stop>

<xsl:variable name="stop-words" 
              select="document('')/xsl:stylesheet/sw:stop/word" />

Declaration of two variables so that we can translate between upper and lower case fairly easily:

<xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />
<xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />

Now the template. I've only used one for brevity, but of course you can split it down into several through calling and applying templates. Within this template, I iterate through each of the titles. For each title, I find all the stop words such that the current title starts with that stop word (plus a space, and all ignoring case). If there is such a match, then the title is substring()ed to give the resulting title by taking off the characters that make up the word it begins with.

<xsl:template match="/">
  <result>
    <xsl:for-each select="xmlfile/book/title">
      <before><xsl:value-of select="." /></before>
      <xsl:variable name="begins-with"
  select="$stop-words[starts-with(translate(current(), $uppercase,
$lowercase), 
                                  concat(translate(., $uppercase,
$lowercase), 
                                         ' '))]" />
      <after>
        <xsl:choose>
          <xsl:when test="$begins-with">
            <xsl:value-of
              select="substring(., string-length($begins-with) + 2)" />
          </xsl:when>
          <xsl:otherwise>
            <xsl:value-of select="." />
          </xsl:otherwise>
        </xsl:choose>
      </after>
    </xsl:for-each>
  </result>
</xsl:template>

This strips leading stop words in SAXON and MSXML (July). It works in Xalan-C++ v.0.40.0 except for the exclude-result-prefixes thing, which is ignored.

However...

>How do you XSL-create a sort criterion? 

...you can't (at the moment) use a template to create a string to use as a sort criterion. Sort criteria have to be XPath select expressions. This problem will go away when (a) you can convert RTFs to node sets and/or (b) when you can use something like saxon:function to declare extension functions within XSLT.

For the meantime, then you have to use something really horrible like:

<xsl:template match="/">
  <result>
    <xsl:for-each select="xmlfile/book/title">
      <xsl:sort select="concat(substring(substring-after(., ' '), 0 div
boolean($stop-words[starts-with(translate(current(), $uppercase,
$lowercase), concat(translate(., $uppercase, $lowercase), ' '))])),
substring(., 0 div not($stop-words[starts-with(translate(current(),
$uppercase, $lowercase), concat(translate(., $uppercase, $lowercase), '
'))])))" />
      <title><xsl:value-of select="." /></title>
    </xsl:for-each>
  </result>
</xsl:template>

(Honestly, it doesn't look that much clearer even when it *is* indented ;)

This works in SAXON, MSXML (July) and Xalan (with the exception of the result-prefixes thing).