Graphs, directed ...

1. Dealing with graphs

1.

Dealing with graphs

Dimitre Novatchev


> I'm looking at graph processing problems as a testbed for this, and
> came across a problem that I haven't been able to solve elegantly. The
> problem is to find "linker" vertexes that a pair of verteces from a
> pre-defined set. For example, if the graph verteces represent cities
> and edges represent flights between them, then given a list of cities,
> find all intermediate cities that you would stop in via a "connecting
> flight".
>
> For example, given the following simple graph:
>
> V1 -> V2 -> V3 -> V4
>         \<- V5 ->/
>
> (V5 points to both V2 and V4), and its XML serialization:
>
>  <graph>
>    <vertex id="V1"/>
>    <vertex id="V2" type="anchor"/>
>    <vertex id="V3"/>
>    <vertex id="V4" type="anchor"/>
>    <vertex id="V5"/>
>    <edge source="V1" target="V2"/>
>    <edge source="V2" target="V3"/>
>    <edge source="V3" target="V4"/>
>    <edge source="V5" target="V2"/>
>    <edge source="V5" target="V4"/>
> </graph>
>
> I would like to transform this into a second graph where all vertexes
> that "link" two anchor distinct vertexes are flagged as link nodes. In
> this case, there are two anchor vertexes V2 and V4, and I can link
> them through V3 (V2 -> V3 -> V4) and V5 (V2 <- V5 -> V4). Note that
> linked verteces must be distinct, so traversing the V2 <- V1 -> V2
> path should not yield V1 as a link node. So I'd like to see something
> like this:
>
> <graph>
>    <vertex id="V1"/>
>    <vertex id="V2" type="anchor"/>
>    <vertex id="V3" linker="true"/>
>    <vertex id="V4" type="anchor"/>
>    <vertex id="V5" linker="true"/>
>    <edge source="V1" target="V2"/>
>    <edge source="V2" target="V3"/>
>    <edge source="V3" target="V4"/>
>    <edge source="V5" target="V2"/>
>    <edge source="V5" target="V4"/>
> </graph>
>
> It would be ideal to come up with a generalized solution that would
> let you use 1, 2, .. N intermediate linking nodes. 

Here's the general solution and it is quite simple and straightforward. I use a recursive template based on the following rule:

  Paths(/.., X, Z) = Union( {X -> Xi} . Paths(X, Xi, Z) )

Where the function Paths(Eset, X, Z) returns all paths from X to Z, that do not include any vertices belonging to the exclusion-set Eset.

Here {X -> Xi} are all arcs from X.

Informally, {X -> Xi} . Paths(X, Xi, Z)

is the Cartesian product of the arc {X -> Xi} with the set of paths Paths(X, Xi, Z), giving a new set of paths.

As you can see the code of the transformation is 71 lines long, but it should be noted that for the purpose of readability I write some XPath expressions on several lines and also almost half of these 71 lines are just closing tags.

Here's the code. This transformation:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:exsl="http://exslt.org/common"
 exclude-result-prefixes="exsl"
 >
  <xsl:output omit-xml-declaration="yes" indent="yes"/>

  <xsl:key name="kNeighbors" match="vertex"
  use="../edge[@target = current()/@id]/@source"/>

  <xsl:key name="kNeighbors" match="vertex"
  use="../edge[@source = current()/@id]/@target"/>

  <xsl:template match="/">
    <xsl:call-template name="getPaths">
      <xsl:with-param name="pNode1"
       select="/*/vertex[@type='anchor'][1]"/>
      <xsl:with-param name="pNode2"
       select="/*/vertex[@type='anchor'][2]"/>
    </xsl:call-template>
  </xsl:template>

  <xsl:template name="getPaths">
    <xsl:param name="pNode1" select="/.."/>
    <xsl:param name="pNode2" select="/.."/>
    <xsl:param name="pExcluded" select="/.."/>

    <xsl:for-each select=
           "key('kNeighbors', $pNode1/@id)
                       [not(@id = $pExcluded/@id)]">
      <xsl:choose>
        <xsl:when test="@id = $pNode2/@id">
          <path>
            <xsl:copy-of
             select="/*/edge[$pNode1/@id = @*
                           and
                             $pNode2/@id = @*
                            ]"/>
          </path>
        </xsl:when>
        <xsl:otherwise>
          <xsl:variable name="vrtfTail">
            <xsl:call-template name="getPaths">
              <xsl:with-param name="pNode1"
                              select="."/>
              <xsl:with-param name="pNode2"
                              select="$pNode2"/>
              <xsl:with-param name="pExcluded"
                        select="$pExcluded | $pNode1"/>
            </xsl:call-template>
          </xsl:variable>

          <xsl:variable name="vTail"
           select="exsl:node-set($vrtfTail)/*"/>

           <xsl:if test="$vTail">
             <path>
               <xsl:copy-of
                  select="/*/edge[$pNode1/@id = @*
                                and
                                  current()/@id = @*
                                  ]"/>

               <xsl:copy-of select="$vTail/*"/>
             </path>
           </xsl:if>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:for-each>
  </xsl:template>
</xsl:stylesheet>

when applied on this source.xml:

<graph>
  <vertex id="V1"/>
  <vertex id="V2" type="anchor"/>
  <vertex id="V3"/>
  <vertex id="V4" type="anchor"/>
  <vertex id="V5"/>
  <edge source="V1" target="V2"/>
  <edge source="V1" target="V3"/>
  <edge source="V2" target="V3"/>
  <edge source="V3" target="V4"/>
  <edge source="V5" target="V2"/>
  <edge source="V5" target="V4"/>
</graph>

produces the wanted result:

<path>
   <edge source="V1" target="V2"/>
   <edge source="V1" target="V3"/>
   <edge source="V3" target="V4"/>
</path>
<path>
   <edge source="V2" target="V3"/>
   <edge source="V3" target="V4"/>
</path>
<path>
   <edge source="V5" target="V2"/>
   <edge source="V5" target="V4"/>
</path>

These are all three different paths (with all nodes in the path only once) from V2 to V4 in the graph described by the xml above.

The first solution:


     V2 -> V1 -> V3 ->V4

is 3 edges long.

The other two are two edges long each (Note that I added to your original graph structure a new arc from V1 to V3 in order to make it more "general").

I hope this helped in this specific problem and also to answer positively your questions about the expressiveness of XPath/XSLT and the appropriateness of XSLT as a tool for solving this type of problems.

Sjoerd Visscher adds

> It would be ideal to come up with a generalized solution that would
>  let you use 1, 2, .. N intermediate linking nodes. I've been able to
> get this working with nested loops, but it isn't particularly
>  declarative or speedy, and is certainly more verbose than I'd like,
>  so I'm wondering if anyone here has insights into how to do this
>  elegantly and in XSLT/XPath style. For example, is it possible to
>  write a single XPath expression that will select <vertex>
>  elements that obey the above criteria? If not, does anyone have any
>  suggestions for how to code this effectively and efficiently with
>  XSLT?

The following XSL transformation does what you want:

<?xml version="1.0"?>
<xsl:transform version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:key name="a" match="/graph/vertex[@type='anchor']" use="@id" />
<xsl:key name="e1" match="/graph/edge/@target" use="../@source" />
<xsl:key name="e2" match="/graph/edge/@source" use="../@target" />

<!-- identity template -->
<xsl:template match="@* | node()">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()" />
  </xsl:copy>
</xsl:template>

<xsl:template match="vertex">
  <xsl:copy>
    <xsl:apply-templates select="@* | node()" />
    <!-- more than one edge to an anchor vertex? -->
    <xsl:if test="count(key('a',key('e1',@id)|key('e2',@id)))&gt;1">
      <xsl:attribute name="linker">true</xsl:attribute>
    </xsl:if>
  </xsl:copy>
</xsl:template>

</xsl:transform>

I've used the key function here as both a look-up and as a filter shortcut. The neat thing about the key function is that it returns a set of distinct nodes. F.e. if you'd have two edges from V1 to V2, the expression key('e1', @id) returns 2 nodes, but when put through the key function again to find the anchor vertices, there's only one result: the V2 vertex.

Extending this to a generalized solution is still hard though...