XSLT, id idref links

ID and IDREF

1. Question about linking using ID/IDREF
2. How to use ID without a DTD.
3. id resolution

1.

Question about linking using ID/IDREF

Jeni Tennison


> The text is defined with <refint> using an attribute of "refid" to
> link to the attribute of "id". I read earlier that I needed to use
> the XSLT style in conjunction with a DTD. I used XMLSPY to
> auto-generate a DTD.

It's true that if you use the id() function then you have to have a DTD to indicate that the attribute you want to use as an id is an ID attribute. However, you can have the same functionality using the key() function instead. So in your case, you could do:

<xsl:key name="tasks" match="task" use="@id" />

and then do:

<xsl:template match="para/refint">
  <font color="blue">
      <a href="#{@refid}">
         <xsl:value-of select="key('task', @refid)"/>
         <xsl:value-of select="."/>
      </a>
  </font>
</xsl:template>

Note that in the above I've used an attribute value template rather than an xsl:attribute element to create the href attribute.

> Once I got my coding in place, I tried to view this data in IE5.5
> but it doesn't seem to work. When I place my cursor over the link,
> in the info-line at the bottom of IE5.5 I see,
> file:///c:/bjdir/amm/chap07.xml#T07-10-00-500-801-A.

When IE gets given the above URL, it goes off and finds chap07.xml. It then looks at that file to try to find an xml-stylesheet processing instruction to indicate an XSLT stylesheet that it should apply.

If it doesn't find one, then it displays the XML in the 'tree view' that IE uses, and it jumps to the relevant element within that XML, so it'll jump to the task with the ID that you give after the #.

If it *does* find an XSLT stylesheet to apply, on the other hand, then it will apply it first, and then try to find the anchor (or id) *within the resulting HTML* and jump to that point of the HTML document.

So, if you're transforming the XML to HTML, then you also need to have something that creates an anchor or an id-ed element within the HTML for the tasks that you want to jump to, something like:

<xsl:template match="task">
   <h2 id="{@id}"><xsl:apply-templates select="title" /></h2>
   ...
</xsl:template>

What IE *doesn't* do is retrieve the XML file, and then *only* display the relevant element, and it also doesn't only transform the relevant element. IE only uses the fragment to jump to a point in whatever it would normally display for the entire document.

2.

How to use ID without a DTD.

G Ken Holman

But you do not need to have an *entire* DTD.

You can have an internal declaration subset informing the processor of the ID type of a given attribute ... this is done a *lot* with XSLT because of the dependency of some document types on ID typed attributes for precisely what you have cited.

Here are two examples I illustrate in my training material:

 - an element is considered to have unique identifier if it has an
   attribute node that is considered an attribute of XML type ID
   only when the source file has a document model whose
   declaration for the attribute is of type ID
   - in a well-formed document without a complete DTD it is
     sufficient to add only an attribute list declaration for the
     attribute in question as in these examples:
    <!DOCTYPE prodsummary [
    <!ATTLIST prod id ID #REQUIRED>
        ]>

               and

     <!DOCTYPE custsummary [
     <!ATTLIST cust custNbr ID #REQUIRED>
                ]>

     - note that there is no special meaning conferred on
     attributes named "id", only on attributes of type "ID"
> it seems to be almost impossible in any practical
>sense to compose a DTD for a document that derives its names from
>multiple namespaces.

For a *complete* DTD, I agree ... but it is not well known that a well-formed XML instance can include significant declarations.

>So how is this conflict being resolved in the real world?

By declaring which attributes in a well-formed instance are to be regarded as being of type ID so the XSLT processor will know which attribute specifications to use when assigning a unique identifier to an element node.

3.

id resolution

W. Eliot Kimber


> Not having used xpointer to date, I don't know of the implications of it
> and how hard it would be to solve the warning problem (maybe id() can
> "throw an exception" somehow and xpointer() can discard it while the
> normal use would display it or somesuch - I would have to look at the
> sources to say that) but my guess is that it would be a simple thing
> for you to do.

When we were formulating XML, there was a lot of discussion of whether we should codify "id" and "idref" as the attributes for ids and references to them--the argument for was that it would avoid exactly the problem that we see with the id() function (and, by extension, bare names interpreted by the XPointer rules). The argument against was that it intruded on the document author's name space, a cardinal sin. I honestly don't remember which way I voted, but I probably voted against codifying "id" because I was, at the time, firmly steeped in the sanctity of the author's right to name.

In hindsight, I think it was a mistake not to have codified "id" and "idref". The effect of the wide non-use of DTDs is that you are pretty much forced to either use "id" as the name of your ID attributes and use the "//*[@id='$name']" form of XPath or you must build into your style sheets complete knowledge of the identifying name spaces used for addressing.

At the end of the day identification and referencing are application-level issues. A strong argument can be made that SGML should have never defined an ID mechanism at all--there are too many requirements it didn't meet and it only makes parsing and validation harder. [How many of you old SGML hands out there worked with or designed systems that defined all the ID attributes as "NAME" or "CDATA" because the business rules required repository-wide IDs or scoped ID uniqueness below the document level or need to solve the cross-document addressing requirements that SGML alone didn't satisfy?]

If the Web has taught us anything it's that a few well-choosen and ubiquitous naming conventions can take you a long way. While I still believe strongly in the author's right to name, I no longer believe it is a right that outweighs all other considerations. I think that the problems raised by the ID attribute are a case in point--how much poorer would we be if "id" only and always meant "id" if it meant that the id() function always worked as expected?

I have recently gone through similar but distinct exercises: implementing the XTM mergeMap element (in a Java processor [isogen.org]) and implementing as-complete-as-possible XPointer support using XSLT and EXSLT extensions [this site]. In the first case I map bare names in XPointers to "//*[@id='$name']" because the XTM spec defines the name of the ID attribute as "id", so users of my mergeMap resolver would expect bare names to *always* work, even though a strict implementation of XPointer would have them fail when there is no DTD (because bare names in XPointer map to id(), not //*[@id='$name']). Most, if not all, of the sample topic maps I've seen use bare names for in-document references, even when those documents do not have IDs--technically those documents should fail because XTM specifies XPointer and bare-name XPointers map to id(). But it would be what I call "Simon says" behavior to not go ahead and do what all the authors of those topic maps expect [If my mergeMap resolver refused to resolve bare names in the absence of a DTD it would be essentially saying "I didn't resolve those links because you didn't say 'Simon Says'--an unpardonable implementation sin in my book.]

However, in my XPointer implementation, I cannot provide that natural fallback behavior. That means that lots of authors will have to rewrite their pointers from bare names to "//*[@id='name']" in order to make their documents work--that seems wrong and avoidable to me.

But at the same time, the XPointer spec had little choice: it would be wrong for that spec to impose the name "id" on users (XML could do it because it's defining the markup language and individual document types can do it, but an after-the-fact support mechanism like XPointer cannot) and codifying even a standard fallback behavior would potentially lead to subtle, non-intuitive failures (or lack of failure when failure is expected) when 'id' was not in fact an ID attribute in a particular document.

I think that the only answer to this problem is for the community of XML users to agree on a convention for IDs and pointers and for it to be codified in XML 2.0 or some such. I would be perfectly happy if "id" always meant "ID" and "href" always meant "URI-based pointer". I think the long-term benefit would far outweigh any cost in constrained naming choices. There will always be specifications like HyTime that provide for complete renaming of everything if you need to step up to that level of flexibilty. I think the current situation represents the worst compromise between ease of use and generality--it doesn't help in the case where you need a simple, consistent solution that is guaranteed to always work and it doesn't provide the flexibility you need because the cost of using it (requiring DTDs with instances) is too high.

XLink exacerbates the problem by fixing the names of the referencing attributes (xlink:href), eliminating choice that might be useful in some cases (for example, in transparently mapping XHTML attributes to XLink semantics). I understand why the XLink committee was hesitant or unwilling to step up to a more general name mapping scheme--it leads to quite a bit of unavoidable complexity, but there are use cases for which it is the only viable solution. [And I remind this audience that the HyTime standard, ISO/IEC 10744:1997, still exists as one solution to the requirement to be able to do unconstrained name remapping if you need it--nobody buys a Swatch Car to haul furniture, but nobody buys a moving truck to commute day-to-day either.]