Merge two files

1. Merge 2 files and sum corresponding elements - Cumbrian technique
2. Merging XML data from different files
3. merging two documents conditionally
4. Merge only unique nodes

1.

Merge 2 files and sum corresponding elements - Cumbrian technique

David Carlisle

problem.

Two xml files

file 1
<fa>
  <a name="a1">3</a>
  <a name="a2">5</a>
  <a name="a3">2</a>
  <a name="new">1</a>
</fa>

file 2
<fb>
  <a name="a1">1</a>

  <a name="a3">1</a>
  <a name="a4">2</a>
</fb>

required output
<res>
   <a name="a1">4</a>
  <a name="a2">5</a>
  <a name="a3">3</a>
  <a name="a4">2</a>
  <a name="new">1</a>
</res>

I.e. I want to sum the corresponding elements to the output file. Can anyone suggest a method of ensuring that I capture in the output all the <a> elements from either file

David offers this gem, the Cumbrian Technique.

<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0"><xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<xsl:for-each select="document('dp-f1.xml')/fa/a|
document('dp-f2.xml')/fb/a[not(@name=document('dp-f1.xml')/fa/a/@name)]">
<xsl:sort select="@name"/><a name="{@name}">
<xsl:value-of select="sum(document('dp-f1.xml')/fa/a[@name=current()/@name]|
document('dp-f2.xml')/fb/a[@name=current()/@name])"/>
</a>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

saxon dp-f.xsl dp-f.xsl 
<?xml version="1.0" encoding="utf-8" ?>
<a name="a1">4</a>
<a name="a2">5</a>
<a name="a3">3</a>
<a name="a4">2</a>
<a name="new">1</a>

2.

Merging XML data from different files

Jeni Tennison


> I have multiple XML files, and I am able to combine the primary and
> the secondary files according to an 'ID' in the first. There are
> extra 'IDs' in the secondary files that I wish to join as well - How
> do I go about this??
	

The elements in the secondary document(s) that do not have an equivalent element in the primary document are those for which it's not the case that their 'id' attribute (or whatever you've called your attribute) is the 'id' attribute of an element in the primary document.

So, let's say that you've set the primary and secondary document root nodes as $doc1 and $doc2 with:

<xsl:variable name="doc1" select="/" />
<xsl:variable name="doc2" select="document('secondary.xml')" />

You can get all the 'id' attributes in $doc1 with:

  $doc1//@id

It may be worth setting up a variable to hold that list:

  <xsl:variable name="doc1IDs" select="$doc1//@id" />

You can get any elements in the secondary document with:

$doc2//*

You can get those whose 'id' attribute equals the 'id' attribute of an element in $doc1 with:

  $doc2//*[@id = $doc1IDs]

So you can get those whose 'id' attribute *does not* equal the 'id' attribute of any element in $doc1 with:

  $doc2//*[not(@id = $doc1IDs)]

So to add them to your result tree, you can just copy those nodes:

  <xsl:copy-of select="$doc2//*[not(@id = $doc1IDs)]" />

If you have really big documents with lots of IDs, then it might be worth using keys rather than a path like the above (or using id() if you have a DTD and the relevant attribute is defined as an ID attribute); if you use keys, then it makes things a little harder because of the way the scoping of keys works: they only retrieve nodes indexed by the key within the document holding the context node. The key would index all elements according to their 'id' attribute:

<xsl:key name="elements" match="*" use="@id" />

So to get a list of them, you have to iterate over the elements in the secondary document (using xsl:for-each here for simplicity, but you could use xsl:apply-templates instead if you wanted):

 <xsl:for-each select="$doc2//*">
     ...
  </xsl:for-each>

You then need to record the element in the secondary document:

  <xsl:for-each select="$doc2//*">
     <xsl:variable name="el2" select="." />
     ...
  </xsl:for-each>

before changing the current node to the primary document:

  <xsl:for-each select="$doc2//*">
     <xsl:variable name="el2" select="." />
     <xsl:for-each select="$doc1">
        ...
     </xsl:for-each>
  </xsl:for-each>

so that you can retrieve nodes using the key you've set up, test whether there is one, and if not then copy (or whatever) the element in the secondary document:

  <xsl:for-each select="$doc2//*">
     <xsl:variable name="el2" select="." />
     <xsl:for-each select="$doc1">
        <xsl:if test="not(key('elements', $el2/@id))">
           <xsl:copy-of select="$el2" />
        </xsl:if>
     </xsl:for-each>
  </xsl:for-each>

3.

merging two documents conditionally

Michael Kay


> I have two documents, file A and file B.  I want to join them 
> on the id of
> the first, but only if a matching id is in the 2nd.  How do I do this?
> 
> File A              File B               Desired Output
> <id> A </id>        <id> A </id>         <id> A </id>
> <id> B </id>        <id> C </id>         <id> D </id>
> <id> D </id>        <id> D </id>
> 
	
<xsl:copy-of select="document('a.xml')//id[.=document('b.xml')//id]"/>

4.

Merge only unique nodes

Oliver Becker

Oliver Becker's technique to merge documents so that "equivalent nodes appear only once in the output", his web site