CDATA Sections

1. CDATA sections
2. Understanding CDATA
3. CDATA Help
4. Can I test for a CDATA section?
5. Copying CData sections from the source to destination documents
6. Displaying document( ) output within CDATA
7. CDATA sections in the output
8. Showing source code program listings
9. CDATA, How to preserve boundaries after XSL transformation
10. How can I pass CDATA sections through to the output of a transform

1.

CDATA sections

Michael Kay

I am trying to transform xml to xml and the end schema dictates that I have to put all of my actual content (HTML page) into an element like this:

<property name="body"><[[CDATA[ potentially nasty,
ill-formed HTML ]&#x5d;></property>

cdata-section-elements takes a whitespace-separated list of element names, e.g.

cdata-section-elements="a b c d e"

2.

Understanding CDATA

Jeni Tennison


>I have this Javascript in my xsl :
>
><script language="JavaScript">
><![CDATA[function image(){
>         window.open('  NOT YET DEFINED ');
>}
>]]>  
></script>
>
>I would like to know how to make the method "WINDOW.OPEN" work with a 
>variable that would get the information which is stored in the IMAGE 
>element of my xml, so putting the image in the opening window. I will have 
>340 different xml pages, one for each expression of the encyclopedia.

From what I gather, you are generating this script element as part of the output HTML from a stylesheet. Just as with any other output that you generate within an XSL page, you can insert values of particular XPath expressions using xsl:value-of. So if you want the value of the @href attribute of the IMAGE element, you can use:

<xsl:value-of select="/ROOT/SEE_PICTURE/IMAGE/@href" />

I'm sure you're aware of this, so I guess that the problem you're having is inserting this into the javascript. If you were to simply insert it into your existing script element like:

<script language="JavaScript">
  <![CDATA[
  function image() {
    window.open('<xsl:value-of select="/ROOT/SEE_PICTURE/IMAGE/@href" />');
  }
  ]>
</script>

then it would be within a CDATA section, and would thus not be interpreted as XSLT (or XML for that matter).

Within the XSLT stylesheet, the CDATA section is purely a utility to stop you from having to escape all the '<' etc. that you would have to otherwise. The CDATA section in your XSLT stylesheet does not translate into a CDATA section in your output. So your script element translates to:

<script language="JavaScript">
  function image() {
    window.open('  NOT YET DEFINED ');
  }
</script>

as there are no peculiar characters to be escaped within it. Given that, you could simply do:

<script language="JavaScript">
  function image() {
    window.open('<xsl:value-of select="/ROOT/SEE_PICTURE/IMAGE/@href" />');
  }
</script>

and the value of the IMAGE's @href will be inserted as an argument. If there *are* characters that need to be escaped in your javascript, then there's no problem just stopping the CDATA section before the piece of XSLT and then starting it again afterwards:

<script language="JavaScript">
  <![CDATA[
  function image() {
    window.open(']><xsl:value-of
                      select="/ROOT/SEE_PICTURE/IMAGE/@href" /><![CDATA[');
  }
  ]>
</script>

Jeni then carries on answering another question...

CDATA sections are designed to take the trauma out of escaping text that has lots of characters that need escaping in (like < and &). Putting <![CDATA[...]> round a section essentially says "do not parse this section - - everything in it should be interpreted as characters".

So, in the example:

<![CDATA[
  <img src="global.gif" alt="Go around the world" />
]>

is exactly the same as:

  &lt;img src="global.gif" alt="Go around the world" />

The XML parser sees a string, not a tag. The XSLT processor therefore sees a string, not a tag, and processes it as a string, not a tag, which means that it outputs:

  &lt;img src="global.gif" alt="Go around the world" />

which gets displayed in your browser.

The goal you're aiming for is copying something that you have in your XML source directly into your HTML output. The xsl:copy-of element is designed precisely for this purpose. xsl:copy-of will give an exact copy of whatever you select, including attributes and content.

So, you can just have XML like:

<page>
  We offer the cheapest air fares to Bombay.
  <img src="global.gif" alt="Go around the world" />
</page>

And then a template that says 'when you come across an img element, just copy it':

<xsl:template match="img">
  <xsl:copy-of select="." />
</xsl:template>

If you have lots of HTML that you want to copy straight over from your XML source to your HTML output, the cleanest approach is to define an 'html' namespace and mark all the HTML elements as belonging to this namespace:

<page xmlns:html="http://www.w3.org/1999/xhtml">
  We offer the cheapest air fares to Bombay.
  <html:img src="global.gif" alt="Go around the world" />
</page>

To copy these HTML elements whenever you come across them within your page, have a template that matches them (html:* - any element in the html namespace) and copies them:

<xsl:template match="html:*">
  <xsl:copy-of select="." />
</xsl:template>

3.

CDATA Help

Mike Brown



> Can someone please explain to me why the following :
> 
>     <![CDATA[<BR/>];]>
> 
> ... gets converted to the following when output is set to html :
> 
>     &lt;BR/>
	

CDATA sections in an XML document serve no other purpose than to unambiguously say "this is all text, not markup". In practice, all it does is it keeps you from having to escape the beginning-of-markup characters '<' and '&' and on occasion the end-of-markup '>'.

You are under the mistaken impression that in a CDATA section '<' and '&' mean something different than '&lt;' and '&amp;' would mean outside of a CDATA section, but they do not; an XML parser will treat them the same.

To further clarify, this XML:

	<p>hello<BR/>world</p>

implies an XPath/XSLT node tree like this:

	element 'p'
	  |___text 'hello'
	  |___element 'BR'
	  |___text 'world'

While this XML:

	<p>hello&lt;BR/&gt;world</p>

or this XML:

	<p><![CDATA[hello<BR/>world];]></p>

are logically equivalent, implying a node tree like this:

	element 'p'
	  |___text 'hello<BR/>world'

If you have that in your result tree and you emit it as XML, the serializer will make start and end tags as appropriate for the nodes, embedding and quoting attribute nodes in the tags as appropriate. The last fragment above would very likely be emitted as <p>hello&lt;BR/&gt;world</p>, although it wouldn't be wrong to emit it as <p><![CDATA[hello<BR/>world];]></p>. You certainly wouldn't want to get output of <p>hello<BR/>world</p> because we've already established that this means something completely different.

HTML output is very similar, the only real difference in this case being that empty elements will be represented by what looks like a start tag only, like <BR> instead of <BR/> ... but that's assuming you've got a BR *element* in your result tree.

So you are hereby challenged to get a BR element into the result tree, so it will be serialized as an actual BR tag by the HTML outputter. Three ways to do it:

1. <BR/> literal result element in the stylesheet
2. <xsl:element name="BR"/> instruction in the stylesheet
3. <xsl:copy-of select="/path/to/an/empty/BR/element/in/source/tree"/>
   instruction in the stylesheet

All well and good, but you said you wanted <BR/>, which is *not HTML*. Why then are you relying on the HTML output method? Tsk. Only one way then, the wrong way, despised because

people who rely on it tend to produce malformed documents
people who rely on it think in terms of tags-in-tags-out, when they should be thinking about the information set implied by an XML document, the trees based on that information, and the automatic derivation of output from the new trees constructed based on the stylesheet contents...
<xsl:value-of select="'&lt;BR/&gt;'" disable-output-escaping="yes"/>

Hey, you asked for verbosity.

4.

Can I test for a CDATA section?

Mike Kay

No, text in a CDATA section looks exactly the same to the XSLT processor as text outside a CDATA section. For example,

<![CDATA[xxx]> looks exactly like xxx
<![CDATA[<>]> looks exactly like &lt;&gt;

This is because they are intended to be equivalent ways of writing the same thing.

5.

Copying CData sections from the source to destination documents

David Carlisle


> So, is there a way that I can specify that a node and its descendants
> should be copied 'as is'?
	

No.

CDATA sections (like entity references) are not considered to be part of the document tree in an XPath processor. They are just an authoring convenience. Not seeing CDATA markup is like not being able to tell if " or ' were used around the attribute values in the original document.

<![CDATA[&&&&&&]]>

produces an _identical_ input tree to

&amp;&amp;&amp;&amp;&amp;&amp;

so there is no way that XSL can distinguish these.

What you can do is request that certain elements are always output using CDATA sections.

Mike Kay adds:

You can easily copy it to an output file that is equivalent, but not to one that is lexically identical. The process of parsing the source XML to produce the XSLT tree loses lexical details such as entity boundaries, CDATA sections, order of attributes, and whitespace within tags. See XSLT Prog Ref page 63.

Does it matter whether the output uses character references or CDATA? They are just different ways of expressing the same information.

6.

Displaying document( ) output within CDATA

Adam Turoff



> Any suggestions on how to insert code from an external
> file into a CDATA section would be appreciated.

If you want to emit a

&lt;![CDATA[ ... ]]> 

section on output, you probably want to look at

 &lt;xsl:output cdata-section-elements=""/>. 

This will make the child text content of an element appear within a CDATA section. You can hack CDATA sections by hand to appear in the middle of a stream of text nodes, but it involves hacking around the output escaping of &lt; and >.

In your case, if you want to show examples as-is, as they are found in an external XML file (or XML fragment), you probably want to use

	  &lt;xsl:copy-of select=document()"/>

Here's a stylesheet fragment that may help you get going.

<xsl:output method="xml" cdata-section-elements="example"/>

<xsl:template match="example">
<example>
<xsl:copy-of select="document(@href)"/>
</example>
</xsl:template>

That will select the document specified by

<example href=""/>

in your source, and display it as

<example><![CDATA[ ... ]]></example> 

in your output.

7.

CDATA sections in the output

Tom Passin


 
  > I need to transform an  XML file into
  > a different XML file. one element needs to be defined as 
  CDATA in the  second XML file.
  >
  > right now I have
  >
  >    <xsl:template match="Action">
  >       <solution><xsl:apply-templates select="text()|i|b" />
  >       </solution>
  >    </xsl:template>
  >
  > which produces an output like
  >
  >    <solution>this is a <i>nice</i> test</solution>
  >
  > but now i need the output to be a CDATA section, as i don't want the
  > <i></i> part being parsed further, eg the output must look like
  >
  >    <solution><![CDATA[this is a <i>nice</i> test]]></solution>
  >

This is a job for the cdata-section-elements attribute of xsl:output. You supply a (whitespace-separated) list of elements for which you want text content to be wrapped in CDATA sections.

8.

Showing source code program listings

Mike Brown


> can anybody give me a hint, how I can store code-listings 
> (java, c++, ....)
> in my xml-file and put it out as html with XSLT?

> It's just for showing some code-chunks on the website.

> I tried to use <![CDATA[  ... code in here ]> 
> but this doesn't work properly.
> I even need do recognize LF to format the output.

You can use CDATA sections in the XML, that's fine. Or you can replace all the "<" with "&lt;" and "&" with "&amp;", which is all a CDATA section saves you from having to do...

<code><![CDATA[

   text and <tags> & stuff

]&#x5D;></code>

is the same as

<code>

   text and &lt;tags> &amp; stuff

</code>

Make sure you have <xsl:output method="html"/>, and copy the parsed text through...

<xsl:template match="code">
  <pre>
    <xsl:value-of select="."/>
  </pre>
</xsl:template>

Your HTML output will have

<pre>

    text and &lt;tags> &amp; stuff

</pre>

The fact that it's in a <pre> will cause whitespace to be preserved when it is rendered by the HTML user agent. The "&lt;" and "&amp;" are how the HTML serializer in the XSLT processor decided to output the "<" and "&" in the parsed data, in order to conform to HTML syntax. It will be parsed by the HTML browser just like it would in XML, so you'll get "<" in the rendering, don't worry. Fine-tune the rendering with CSS. No need to get fancy with replacing linefeeds with <br>s since <pre> does all the work for you.

Michael Kay offers

Hold it in the XML file in a CDATA section, and output it in an HTML pre element.

9.

CDATA, How to preserve boundaries after XSL transformation

David Carlisle

your requirements are incompatible with using XSLT.

to XSLT

a <![CDATA[ 1 < 2 ]]> b

is _identical_ input to

a 1 &lt; 2 b

CDATA is just a syntactic alternative to using references, and is not recorded in the input tree, just as <a b="2"/> is the same as <a b = '2' /> produce identical input and you can not preserve white space and quote styles inside tags.

10.

How can I pass CDATA sections through to the output of a transform

Andrew Welch


Bottom line, you can't. 

But! By capturing input events and modifying the source the entity information can be put through to the output for further processing. See this for the detail

Converting cdata sections into markup
Preserving entity references
Preserving the doctype declaration
Marking up comments

Quite versatile.