XSLT, generate DTD, declaration

Doctype

1. how do I generate a reference to a DTD
2. How to copy the DOCTYPE value
3. Can I specify a DOCTYPE in my stylesheet
4. Is there an XSLT DTD
5. DOCTYPE in output
6. Internal DTD-subset and CDATA-section
7. Testing if current document equal to xxx.xml
8. Match on an element when Doctype is present

1.

how do I generate a reference to a DTD

Hakan Pettersson


<xsl:text disable-output-escaping="yes">
	<![CDATA[
		<!DOCTYPE wml PUBLIC "-//WAPFORUM//DTD WML 1.1//EN" 
"http://www.wapforum.org/DTD/wml_1.1.xml">
	]]>
</xsl:text>
    

Mike Brown notes:

As Mike Kay pointed out, I didn't notice that later in the Output part of the spec, it says that using disable-output-escaping lifts any well-formedness restrictions.

This is just one of those things where the spec is carefully worded for clarity from the "here are the results of using this instruction" point of view, with few or no examples given for "if you're trying to achieve a certain result, here's how to use this instruction to do it"...

I think more informative examples would help, but the wording of the normative sections is probably fine.

Matthew Bentley offers



I am using XSLT for just that very purpose - and I test the starting element
to find out which doctype I want to insert - all you have to do is output
CDATA:

<!-- MATCH ROOT NODE, GENERATE DOCTYPE BASED ON STARTING ELEMENT: -->
<xsl:template match="/">
	<xsl:choose>
		<xsl:when test="bva.grp">
			<xsl:text
disable-output-escaping="yes"><![CDATA[<!DOCTYPE VALUE-ADD.GROUP PUBLIC
"-//Brooker's//DTD Brooker's Legislation Value-Add Group//EN">]]></xsl:text>
		</xsl:when>	
		<xsl:when test="act">
			<xsl:text
disable-output-escaping="yes"><![CDATA[<!DOCTYPE ACT PUBLIC
"-//Brooker's//DTD Brooker's Act//EN">]]></xsl:text>
		</xsl:when>	
		... etc

	</xsl:choose>
	<xsl:apply-templates />
</xsl:template>

And obviously you can put entity references inside the CDATA section as
well, although in this particular case I haven't needed to.


2.

How to copy the DOCTYPE value

Steve Muench


if you preprocess a document with:

  <!DOCTYPE xxx SYSTEM "yyy">
  <xxx>
    <foo/>
  </xxx>

into:

  <!DOCTYPE xxx SYSTEM "yyy">
  <!-- DOCTYPE xxx SYSTEM "yyy" -->
  <xxx>
    <foo/>
  </xxx>

This doesn't alter the validity of the document in any way, but does add a "comment item" into the document's infoset that XSLT/XPath can address.

Then you can use an XPath expression like:

  file://comment()[contains(.,'DOCTYPE')][1]

to refer to the first comment containing DOCTYPE and then use a combination of normalize-space(), substring-after, and substring() to get out the uri for the DTD of the document.

Since you cannot set the doctype-system="" property of <xsl:output> dynamically, you'd have to then resort to a use of

<xsl:value-of disable-output-escaping="yes"/>

and concat() to literally print the <!DOCTYPE into the result tree.

Given the post-processed source document above, the following XSLT transform produces the output:

  <!DOCTYPE xxx SYSTEM "yyy">
  <xxx>
    <foo/>
  </xxx>



<xsl:transform 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output indent="yes"/>

  <xsl:template match="/">
    <!--
     | Output the Doctype in the result based on
     | the DOCTYPE comment we preprocessed into the document
     +-->

    <!-- For convenience, get a literal quote sign in a variable -->
    <xsl:variable name="q">"</xsl:variable>

    <!-- Get the DOCTYPE comment in a variable -->
    <xsl:variable name="d"
  select="//comment()[contains(.,'DOCTYPE')][1]"/>

    <!-- Get the "uri" part of the doctype comment -->
    <xsl:variable name="e"
  select="substring-after(normalize-space($d),
	'SYSTEM ')"/>

    <!-- Strip off the quotes from the "uri" -->
    <xsl:variable name="f"
    select="substring-before(substring-after($e,$q),$q)"/>

    <!-- Output the <!DOCTYPE -->
    <xsl:value-of disable-output-escaping="yes"
          select="concat('<!DOCTYPE ',name(/*[1]),
                  ' SYSTEM',$q,$f,$q,'>
')"/>
    <xsl:apply-templates 
         select="@*|*|processing-instruction()|comment()"/>
  </xsl:template>

  <!--
   | Identity Transformation. XT doesn't seem to support the
   | more terse "@*|node()" at present, so this is the long form.
   +-->
  <xsl:template 
	match="@*|*|processing-instruction()|comment()">
    <xsl:copy>
      <xsl:apply-templates
select="@*|*|processing-instruction()|comment()"/>
    </xsl:copy>
  </xsl:template>

  <!-- Suppress printing our little trick in the output -->
  <xsl:template match="//comment()
	[contains(.,'DOCTYPE')][1]"/>

</xsl:transform>
    

Mike Brown cautions:

Anyone using this should note that this will only work if the string '-->' does not occur in the internal DTD subset. The following would throw it, for example:

 <!DOCTYPE xxx SYSTEM "yyy" [
 <!-- a comment in the internal subset -->
 <!ENTITY foo "bar"> ]>

            

3.

Can I specify a DOCTYPE in my stylesheet

David Carlisle


yes

<!DOCTYPE xsl:stylesheet [
<!ENTITY API  "this" >
]>

....

    <xsl:template match="/index">
    <general>
        <title>&API; Index</title>

but then &API; will be expanded by the xml parser as it parses the stylesheet so the xsl engine will see <title>this Index</title> In which case there may or may not be any point in having the entity, you could have just written "this index" in place.

You can probably more usefully share the dtd with your original document, if that alrady has a definition of the entity:

<!DOCTYPE xsl:stylesheet SYSTEM "..\..\docs\dtds\general.dtd" >

This only works if your xsl system uses a non validating parser that does read external entities. (They are not required to be read according to the xml spec.)

Note that both of these solutions put `this' into your output. If you really want &API; then you don't need a doctype in your stylesheet at all just use <title> <xsl:text disable-output-escaping="yes">&API; Index<xsl:text> </title>

Note however this last solution only works if you know the output tree is going to be linearised into a file and then reparsed as XML. If instead the result tree is being stuffed straight to a renderer or other XML application as an XML tree, then the receiver will get the characters & A P I ; not an entity reference.

4.

Is there an XSLT DTD

Linda van den Brink

There's an appendix to the XSLT spec "DTD Fragment for XSLT Stylesheets (Non-Normative)" atW3C

David Carlisle adds

> Thank you but it doesn't help as it is not a complete DTD.

No, you have to read the instructions on how to complete it for your stylesheet

It is _impossible_ to write a dtd that covers every xsl stylesheet as they may include arbitrary elements from the target DTD. So you have to define the result result-elements entity to list any result elements taht may appear inside an xsl element and then add all possible result elements to the dtd. Normally it isn't worth the bother, no one uses validating parsers to read xsl stylesheets do they?

John Simpson adds

If you're transforming to HTML, then there's one "valid XSLT DTD" with one definition of result-elements. If you're transforming to MathML, there's a completely different result-elements. In fact, there are as many definitions of result-elements as there are possible XML vocabularies in the universe. Hence there can *be* no result-elements, and that's why the appendix to the XSLT Rec is both a fragment and non-normative.

5.

DOCTYPE in output

David Carlisle


   <!DOCTYPE DLmeta SYSTEM "http://www.dlmeta.de/dlmeta/2000/DLmeta.dtd" 
	   [<!ENTITY %LocalInclude SYSTEM
   "http://www.dlmeta.de/local/2000/ariadne/ariadne_local.dtd">
	   %LocalInclude;
	   ]>
	

How would I declare this in my style sheet?

As others have commented, you can't directly, but do you _have_ to use a local subset.

The above is equivalent to

    <!DOCTYPE DLmeta SYSTEM "local-DLmeta.dtd" >

where local-DLmeta.dtd is

<!ENTITY %LocalInclude SYSTEM
   "http://www.dlmeta.de/local/2000/ariadne/ariadne_local.dtd">
%LocalInclude;
<!ENTITY % main SYSTEM "http://www.dlmeta.de/dlmeta/2000/DLmeta.dtd" >
%main;

and you can produce the doctype in this form using the standard

<xsl:output doctype-system="local-DLmeta.dtd"/>

6.

Internal DTD-subset and CDATA-section

Jeni Tennison


> what could I write in my XSLT to output the following
> as part of the result:
>
> <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20001102//EN"
> "http://www.w3.org/TR/2000/CR-SVG-20001102/DTD/svg-20001102.dtd"
> [
> <!ENTITY fast-slow "0 0  .5 1">
> <!ENTITY slow-fast ".5 0  1 1">
> ]>
> <svg
> xmlns="http://www.w3.org/Graphics/SVG/SVG-19990812.dtd"
> xmlns:xlink="http://www.w3.org/1999/xlink" viewBox="0
> 0 800 600">
> <style type="text/css">
> <![CDATA[
> .balls {font: 30pt arial}
> ]]></style>

You can set the DOCTYPE declaration and the fact that you want to use a CDATA section (although as David C. said, there's no point in having one with the example as shown) using the xsl:output element:

<xsl:output
  doctype-public="-//W3C//DTD SVG 20001102//EN"
  doctype-system="http://www.w3.org/TR/2000/CR-SVG-20001102/DTD/
  svg-20001102.dtd"
  cdata-section-elements="style" />

However, this doesn't allow you to define an internal subset in the way that you have. Using pure XSLT, you have to do this using disable-output-escaping:

  <xsl:text disable-output-escaping="yes"><![CDATA[
    <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 20001102//EN"
      "http://www.w3.org/TR/2000/CR-SVG-20001102/DTD/svg-20001102.dtd"
    [
    <!ENTITY fast-slow "0 0  .5 1">
    <!ENTITY slow-fast ".5 0  1 1">
    ]>
  ]]></xsl:text>

(Saxon has some support for creating internal DTD subsets.)

There's no much point in defining entities unless you're going to use them, so presumably you'll also want to create entity references within your output. Again, you have to disable output escaping to ensure that the entities are used:

  <xsl:text 
	 disable-output-escaping="yes">&fast-slow;</xsl:text>

(Or you can use saxon:entity-ref.)

Note that you cannot use disable-output-escaping to put the entity reference in as an attribute value. There is no way of doing that in XSLT.

Having said that, you should be careful using disable-output-escaping. You cannot guarantee that a processor will understand it or use it. Generally, you should not care about using entities in your output - you should just generate the text that you want.

To create the namespace declarations, you just have to have the svg element created somewhere where the namespace declarations are in scope. For example:

  <svg xmlns="http://www.w3.org/Graphics/SVG/SVG-19990812.dtd"
       xmlns:xlink="http://www.w3.org/1999/xlink"
       viewBox="0 0 800 600">
    ...
  </svg>

7.

Testing if current document equal to xxx.xml

David Carlisle


<xsl:if test="generate-id(/) = generate-id(document('doc.xml'))">
  Yes it is
  </xsl:if>

8.

Match on an element when Doctype is present

David Carlisle



> I'm having problems navigating to the html elment and other elements
> when I leave the Doctype and namespace at the top of the xhtml doc. 

this is a FAQ.

To match in the xhtml namespace (which is defaulted by the xhtml dtd even if you don't make it explicit) you need to declare the xhtml namespace in your stylesheet with something like xmlns:h="http://www.w3.org/1999/xhtml" then use h:html rather than html etc.