net.sourceforge.olduvai.treejuxtaposer
Class Tree2Tree

java.lang.Object
  extended by net.sourceforge.olduvai.treejuxtaposer.Tree2Tree

 class Tree2Tree
extends java.lang.Object

Tree2Tree does the precomputation for each pair of trees. The two precomputation tasks are computing the best corresponding node for each node and building the range query data structures.

Version:
2.1
Author:
Tamara Munzner, Serdar Tasiran, Li Zhang, Yunhong Zhou
See Also:
Tree, RangeList

Nested Class Summary
private  class Tree2Tree.NodeScorePair
          Container class for returning best corresponding node+score pair.
private  class Tree2Tree.TmpD
          Attachment to a node that is needed as temporary data structure when computing best corresponding nodes.
 
Field Summary
private  java.util.Hashtable A2B
          Subtree hashtable for A->B.
private  java.util.Hashtable B2A
          Subtree hashtable for B->A.
private  java.util.Vector bestA2B
          Vector for A->B that stores the best corresponding nodes for different levels.
private  java.util.Vector bestB2A
          Vector for B->A that stores the best corresponding nodes for different levels.
private  float epsilon
          Minimum edge weight if they are used.
private  int SubtreeLeafCutoff
          Restriction on the maximum number of nodes to store for any subtree's forest of best corresponding nodes.
private  Tree treeA
          Arbitrary tree A for this comparison.
private  Tree treeB
          Arbitrary tree B for this comparison.
 
Constructor Summary
Tree2Tree(Tree t1, Tree t2, int edgeweightLevels)
          Initialize the two vectors
 
Method Summary
private  void addNodeToForest(TreeNode node, java.util.ArrayList array, java.util.Hashtable hash)
          Adds the node to the hashtable, indexed by its key (as an Integer).
private  Tree2Tree.NodeScorePair computeBestMatch(TreeNode sourceNode, Tree sourceTree, Tree targetTree, float edgeCoefficient, Tree2Tree.TmpD[] tmpData)
          Compute the best match for sourceNode from sourceTree in the targetTree.
private  void computeForest(java.util.Hashtable X2Y, Tree treeX, Tree treeY, AccordionTreeDrawer atdY, int cutoff)
          Compute the forest of marked nodes in treeY, for subtrees under every node in treeX.
protected  TreeNode getBestCorrNode(Tree source, TreeNode n, Tree other, int el)
          Computes the node in Tree "other" whose set of descendant leaves best matches that of TreeNode n in Tree "source" The best match is the node n' maximizing the following score | S(n) Intersection S(n') | / | S(n) Union S(n') | where S(n) is the set of leaves that are descendants of node n.
protected  float getBestCorrNodeScore(Tree source, TreeNode n, Tree other, int el)
          Identify input trees as treeA or treeB and look up the best score from the BCN of the input node from A to tree B.
private  Tree2Tree.NodeScorePair getBestNodeScorePair(TreeNode sourceNode, Tree sourceTree, Tree targetTree, Tree2Tree.TmpD[] tmpData)
          Find the best corresponding node for a given node sourceNode:sourceTree, in the target tree targetTree.
protected  java.util.ArrayList getCorrRange(Tree source, TreeNode n, Tree other, int el)
          Identify input trees as treeA or treeB and look up the best matching nodes of the input node from A in tree B.
private  void onewayTreeCompare(Tree t1, Tree t2, java.util.Vector v12, int edgeweightLevels)
          For each node on Tree t1, computes the best matching node in Tree t2 and stores it in Vector v12.
private  java.util.ArrayList reduceNodeListToCutoff(java.util.ArrayList node, int cutoff)
          Return an ArrayList of a reduced number of elements from ArrayList based on the number of leaves; bigger subtrees are kept, ties broken by taking any subtree with maximum number of leaves not already in forest.
private  void removeNodeFromForest(TreeNode node, java.util.ArrayList array, java.util.Hashtable hash)
          Removes the node from the hashtable, indexed by its key (as an Integer).
protected  void subtree2Forest(AccordionTreeDrawer atdA, AccordionTreeDrawer atdB, int eL)
          Preprocessing: calculate and store forests that correspond to subtrees.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

treeA

private Tree treeA
Arbitrary tree A for this comparison.


treeB

private Tree treeB
Arbitrary tree B for this comparison.


A2B

private java.util.Hashtable A2B
Subtree hashtable for A->B. Each node (table index) stores reference to an array of nodes that are marked if the subtree under the index node is marked. The number of items is limited to SubtreeLeafCutoff.


B2A

private java.util.Hashtable B2A
Subtree hashtable for B->A. Each node (table index) stores reference to an array of nodes that are marked if the subtree under the index node is marked. The number of items is limited to SubtreeLeafCutoff.


bestA2B

private java.util.Vector bestA2B
Vector for A->B that stores the best corresponding nodes for different levels. Each element of the vector is a hashmap that is indexed by nodes


bestB2A

private java.util.Vector bestB2A
Vector for B->A that stores the best corresponding nodes for different levels. Each element of the vector is a hashmap that is indexed by nodes


epsilon

private final float epsilon
Minimum edge weight if they are used. Edge weights of 0 cause problems, so this should be set higher.

See Also:
Constant Field Values

SubtreeLeafCutoff

private final int SubtreeLeafCutoff
Restriction on the maximum number of nodes to store for any subtree's forest of best corresponding nodes. This doesn't have to be big, if a subtree is marked, only the top nodes in the forest hierarchy need to be detected, since children of the marked subtree should contain more information about what is marked. Storing all marks for all subtrees would be quadratic storage.

See Also:
Constant Field Values
Constructor Detail

Tree2Tree

public Tree2Tree(Tree t1,
                 Tree t2,
                 int edgeweightLevels)
Initialize the two vectors

Parameters:
t1 - Tree A/1 of this comparison object.
t2 - Tree B/2 of this comparison object.
edgeweightLevels - The number of edge weight levels used in the comparisons. Defaults to 1 if not specified.
Method Detail

addNodeToForest

private void addNodeToForest(TreeNode node,
                             java.util.ArrayList array,
                             java.util.Hashtable hash)
Adds the node to the hashtable, indexed by its key (as an Integer).

Parameters:
node - Node to insert.
array - Array of integers, storing list of keys found in the hashtable.
hash - Lookup table for nodes, indexed by their key.

removeNodeFromForest

private void removeNodeFromForest(TreeNode node,
                                  java.util.ArrayList array,
                                  java.util.Hashtable hash)
Removes the node from the hashtable, indexed by its key (as an Integer).

Parameters:
node - Node to remove.
array - Array of integers, storing list of keys found in the hashtable.
hash - Lookup table for nodes, indexed by their key.

reduceNodeListToCutoff

private java.util.ArrayList reduceNodeListToCutoff(java.util.ArrayList node,
                                                   int cutoff)
Return an ArrayList of a reduced number of elements from ArrayList based on the number of leaves; bigger subtrees are kept, ties broken by taking any subtree with maximum number of leaves not already in forest. This function allows us to store top-level nodes of large trees without having to keep references to all nodes that we would have to mark. Descendant nodes in the marked subtree will also be checked, so the reduced list does not have to be exhaustive.

Parameters:
node - List of nodes that would be marked (forest) if a subtree is marked (referenced by marking the root of this subtree).
cutoff - Limit on number of nodes stored for each subtree. Currently uses SubtreeLeafCutoff defined value.
Returns:
A reduced list of nodes from the original set, which is not as large as the cutoff value.

computeForest

private void computeForest(java.util.Hashtable X2Y,
                           Tree treeX,
                           Tree treeY,
                           AccordionTreeDrawer atdY,
                           int cutoff)
Compute the forest of marked nodes in treeY, for subtrees under every node in treeX. This precomputation avoids the on-the-fly tree traversals for complex BCN matchings (BCN is not 1-to-1, so each node that ties the best score is marked). The forest is reduced to the cutoff value (constant SubtreeLeafCutoff) to prevent storage of each node for large subtrees; results for descendants fill in missing values as needed.

Parameters:
X2Y - BCN forest table for treeX to treeY (input empty, filled by this function).
treeX - First tree, each node in this tree is processed into a forest.
treeY - Second tree, nodes are located that match under subtree of node in treeX, and stored in the BCN table.
atdY - Drawer for treeY.
cutoff - Cutoff value for forest size, currently set to SubtreeLeafCutoff.

subtree2Forest

protected void subtree2Forest(AccordionTreeDrawer atdA,
                              AccordionTreeDrawer atdB,
                              int eL)
Preprocessing: calculate and store forests that correspond to subtrees. A2B/B2A: hash tables that map nodes (roots of subtrees) from A/B to forests of nodes (subtrees) in B/A All BCN work is done in computeForest, called both ways in this function (A->B and B->A).

Parameters:
atdA - Drawer for tree A.
atdB - Drawer for tree B.
eL - number of edge weight levels to compute (not used).

getBestCorrNode

protected TreeNode getBestCorrNode(Tree source,
                                   TreeNode n,
                                   Tree other,
                                   int el)
Computes the node in Tree "other" whose set of descendant leaves best matches that of TreeNode n in Tree "source" The best match is the node n' maximizing the following score | S(n) Intersection S(n') | / | S(n) Union S(n') | where S(n) is the set of leaves that are descendants of node n.

Parameters:
source - Source tree that contains the node being looked up in the other tree.
n - Node being looked up in the target tree.
other - Target tree to find the best node for the input node.
el - Edge length weight to use for lookup.
See Also:
Tree, TreeNode, Tree2Tree.NodeScorePair

getBestCorrNodeScore

protected float getBestCorrNodeScore(Tree source,
                                     TreeNode n,
                                     Tree other,
                                     int el)
Identify input trees as treeA or treeB and look up the best score from the BCN of the input node from A to tree B.

Parameters:
source - Tree that contains the target node.
n - The target node.
other - Second tree, for identifying the appropriate hashmap (either bestA2B or bestB2A).
el - The edge weight length to use for lookups.
Returns:
The score of the BCN of n in B.

getCorrRange

protected java.util.ArrayList getCorrRange(Tree source,
                                           TreeNode n,
                                           Tree other,
                                           int el)
Identify input trees as treeA or treeB and look up the best matching nodes of the input node from A in tree B.

Parameters:
source - Tree that contains the target node.
n - The target node.
other - Second tree, for identifying the appropriate hashmap (either bestA2B or bestB2A).
el - The edge weight length to use for lookups.
Returns:
The list of nodes in the second tree that best match the input node. Many matches possible since the BCN is not 1-to-1.

onewayTreeCompare

private void onewayTreeCompare(Tree t1,
                               Tree t2,
                               java.util.Vector v12,
                               int edgeweightLevels)
For each node on Tree t1, computes the best matching node in Tree t2 and stores it in Vector v12.

Parameters:
t1 - The first tree
t2 - The second tree
v12 - Vector to store the hashmaps (t1:t2 and t2:t1) for matching node pairs for T1 vs T2. Each edge level has a hashmap of node-node pairs in the vector.
edgeweightLevels - Number of edge weight levels to store
See Also:
Tree, computeBestMatch(TreeNode, Tree, Tree, float, net.sourceforge.olduvai.treejuxtaposer.Tree2Tree.TmpD[]), Tree2Tree.NodeScorePair

getBestNodeScorePair

private Tree2Tree.NodeScorePair getBestNodeScorePair(TreeNode sourceNode,
                                                     Tree sourceTree,
                                                     Tree targetTree,
                                                     Tree2Tree.TmpD[] tmpData)
Find the best corresponding node for a given node sourceNode:sourceTree, in the target tree targetTree. This is the level 0 processing and preprocessing of the array used to compute higher edge levels in computeBestMatch(TreeNode, Tree, Tree, float, net.sourceforge.olduvai.treejuxtaposer.Tree2Tree.TmpD[]). How to compute the best corresponding node for each node: node B is the best corresponding node of node A if it maximizes | L(A) U L(B)| ---------------- | L(A) n L(B)| where L(A),L(B) represent the set of leaves underneath the node A and node B respectively. For the description of the algorithm, see Li Zhang. On Matching Nodes in Two Trees.

Parameters:
sourceNode - Node of interest in sourceTree, get the corresponding Tree2Tree.NodeScorePair in targetTree.
sourceTree - Tree that has sourceNode.
targetTree - Tree to look up a corresponding node, wrt to sourceNode.
tmpData - Array initialized by level 0 processing (getBestNodeScorePair(TreeNode, Tree, Tree, net.sourceforge.olduvai.treejuxtaposer.Tree2Tree.TmpD[])), used to compute best nodes.
Returns:
A node and it's score as a Tree2Tree.NodeScorePair that best corresponds to sourceNode.

computeBestMatch

private Tree2Tree.NodeScorePair computeBestMatch(TreeNode sourceNode,
                                                 Tree sourceTree,
                                                 Tree targetTree,
                                                 float edgeCoefficient,
                                                 Tree2Tree.TmpD[] tmpData)
Compute the best match for sourceNode from sourceTree in the targetTree. Only for tree comparisons with more than one edge weight level (this function is for levels 1 and higher, the default level 0 is done by getBestNodeScorePair(TreeNode, Tree, Tree, net.sourceforge.olduvai.treejuxtaposer.Tree2Tree.TmpD[]), which also initializes the arrays used in this function.

Parameters:
sourceNode - Node of interest in sourceTree, get the corresponding Tree2Tree.NodeScorePair in targetTree.
sourceTree - Tree that has sourceNode.
targetTree - Tree to look up a corresponding node, wrt to sourceNode.
edgeCoefficient - Edge level coefficient. Non-zero as level 0 is done in getBestNodeScorePair(TreeNode, Tree, Tree, net.sourceforge.olduvai.treejuxtaposer.Tree2Tree.TmpD[]).
tmpData - Array initialized by level 0 processing (getBestNodeScorePair(TreeNode, Tree, Tree, net.sourceforge.olduvai.treejuxtaposer.Tree2Tree.TmpD[])), used to compute best nodes.
Returns:
A node and it's score as a Tree2Tree.NodeScorePair that best corresponds to sourceNode.