Document()

1. document() use
2. Selection, or joins
3. Using Collections
4. Collections

1.

document() use

Mike Kay


> But I hope that XSLT 2 solves these problems. It would really 
> be nice if one could pass the context into key() and id().

Yes, XSLT 2.0 allows you to write

select="document('a.xml')/id('I0001')"

or

select="document('a.xml')/key('k', 'I0001')"

In fact it allows the operands of "/" to be any expressions that return node-sequences.

You can use this today using Saxon

2.

Selection, or joins

Mike Kay





> I have a main document,
> form is:
>
>          <r id="d0e170156">
>               <bibno>079732</bibno>
>               <cat-num>10400</cat-num>
>               <TBcategory>General fiction</TBcategory>
>               <id>079732*RNIB*</id>
>               <num-cassettes>1</num-cassettes>
>               <play-time>08:47</play-time>
>               <rdr-gender>F</rdr-gender>
>               <rearch-date/>
>          </r>
>
> And an external file, form is:
>
>  <bk>
>       <n>2</n>
>       <t>THE CAR MAKERS</t>
>    </bk>
>    <bk>
>       <n>3</n>
>       <t>WYATT'S HURRICANE</t>
>    </bk>
>    <bk>
>       <n>4</n>
>       <t>THE RELUCTANT WIDOW</t>
>    </bk>
>    <bk>
>
>
> I need to loop through each of the main input document
> records (r elements)
>  and if r/cat-num is any one of the values of the external file, /bk/n
> then I need to copy the r element and contents through to the output.
>
> Can xslt2.0 help with this form of filtering?

The XSLT 1.0 solution is

<xsl:copy-of select="r[cat-num = document('ext.xml')/bk/n]"/>

and this continues to work in XSLT 2.0.

What XSLT 2.0 (or XPath 2.0) gives you is the ability to extend this to do a join that isn't a pure equijoin. For example if you want to do a case-blind comparison you can now write:

<xsl:copy-of select="r[some $c in document('ext.xml')/bk/n
                       satisfies upper($c)=upper(cat-num)]"/>

Joins other than equijoins weren't in general possible in XPath 1.0, you had to code them yourself using nested loops at the XSLT level.

3.

Using Collections

Andrew Welch

You can process directories of XML using the collection() function, and keep memory usage constant by using the Saxon extension saxon:discard-document()


<xsl:for-each select="for $x in
collection('file:///c:/xmlDir?select=*.xml;recurse=yes;on-error=ignore')
return saxon:discard-document($x)">

You have to be careful that Saxon doesn't optimize out the call to saxon:discard-document() - this basic outer xsl:for-each works well and has become boilerplate code for whenever I start a new report.

This technique allows you to do things that would otherwise not be feasible with XSLT, and would take longer in another language. For example finding, grouping and sorting all links in your collection of XML files. Coding the XSLT takes minutes and running it takes time proportional to your dataset size, but the restriction of system memory has gone.

David Carlisle adds

> in what way do you use the collection() function?

collection is good for collections where you don't know in advance which documents will be there for example saxon lets you go collection('foo?select=*.xml') to pick up all xml files in a directory. It's also likely to be what's used to map to xml databases and the like I would expect.

 <xsl:variable name="files" select=
"collection('file:///sgml/?select=*.xml;on-error=warning') "/>
     <doc>
       <xsl:copy-of select="$files/*"/>
   </doc>



 <xsl:for-each select="for $f in
collection('file:///sgml/?on-error=warning;select=c*.xml') return $f">
     <file name="{document-uri(.)}">
         <xsl:copy-of select="."/>

     </file>
   </xsl:for-each>

4.

Collections

Mike Kay


I'm trying to load a group of xml files that are found in a local
sub-directory where the xslt file is located, what i'm doing is:

<xsl:for-each select="collection('XML files')">
   <xsl:copy-of select="."/>
</xsl:for-each>

But i'm getting an invalid URI error.

If the directory is called "c:\XML files", then you should use

collection('file:///c:/XML%20files')

(It might work without escaping the space as %20, I'm not sure)

If you use a relative URI, then it will be taken as relative to the URI of the stylesheet.