Whitespace

1. Spaces turn to %20, how to restore them?
2. Whitespace treatment in 2.0
3. Not-quite normalize-space()

1.

Spaces turn to %20, how to restore them?

Jeni Tennison



> This will get easier in XSLT 2.0, right?  Something like:
>
> <xsl:analyze-string select="text()" regex="%20">
>   <xsl:non-matching-substring>
>     <xsl:value-of select="."/>
>   </xsl:non-matching-substring>
>   <xsl:matching-substring>
>     <xsl:text> </xsl:text>
>   <xsl:matching-substring>
> </xsl:analyze-string>
>

Easier would be to use the replace() function in XPath 2.0:

  <xsl:value-of select="replace(text(), '%20', ' ')" />

You only need to use <xsl:analyze-string> if you want to do further manipulation of the matching (or non-matching) strings.

2.

Whitespace treatment in 2.0

Evan Lenz


> An XSLT processor is at liberty to use character references for any
> character in the output file, rather than using the native character.
> There's nothing in the spec that prevents this, and any XML
> parser (or HTML
> user agent) will treat the character reference and the native character
> identically.

There is one notable exception to this--whitespace character references other than #x20 (space) in attribute values, namely #xD, #xA, and #x9.

"Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value..."[1]

For example, using a new XSLT 2.0 construct, we can see the difference in behavior very clearly:

<xsl:value-of select="(1,2,3)" separator="
"/>

will output:

1 2 3

whereas this:

<xsl:value-of select="(1,2,3)" separator="&#xA;"/>

will output:

1
2
3

In the first case, the XML parser reads the attribute value as the space character #x20 (according to attribute value normalization rules), and in the second example the XML parser reads the attribute value as the line feed character #xA. Note that serialization algorithms are also required to make this distinction in order to correctly round-trip attribute values.

[1] [1]

3.

Not-quite normalize-space()

David Carlisle


It would normally be a simple call to replace() in
2.0, but in 1,0  if by "retain" you mean keep a single white space if
there is any white space:

a) strip off leading WS, retain trailing WS, and normalize the rest

<xsl:variable name="x" select="concat(.,'!@@@!')"/>
<xsl:value-of select="substring-before(normalize-space($x),'!@@@!')"/>

b) retain leading WS, strip off trailing WS, normalize the rest

<xsl:variable name="x" select="concat('!@@@!',.)"/>
<xsl:value-of select="substring-after(normalize-space($x),'!@@@!')"/>

c) retain leading and trailing WS, normalize the rest

<xsl:variable name="x" select="concat('!+++!.,'!@@@!')"/>
<xsl:value-of select="substring-after(substring-before(normalize-space($x),'!@@@!'),!+++!')"/>