Cheap and Secure Web Hosting Provider : See Now

# Complexity of testing membership in a disjoint set

, ,
Problem Detail:

I have a disjoint set data structure (sometimes known as a union-find data structure) where I store a value in each "node". I want to look up a node by value. How can I do this?

The representations normally discussed in literature, both linked list and forests, deal with operations being given nodes (of the linked list or forest), not values contained therein, which avoids any difficulty caused by having to find the node that contains a given value. From what I can see, the question of whether a given disjoint set contains in any subset a given value, or finding the node that contains a given value if one exists, is scarcely touched by the literature.

However, given values, this question needs to be addressed for an implementation of what the CLRS Introduction to Algorithms calls `Make-Set(x)` if one wishes to not trust the user of the API to always know what is and is not in the set, as it has a prerequisite that `x` does not already exist in the set.

With both representations, I see no way better than \$O(n)\$ to check membership, as one must always check all nodes for equality. Obviously one could maintain a separate mapping of values to nodes for the set, with \$O(log n)\$ lookup given an ordered value type, but this means duplicating the whole set in memory. Is there any way to represent a disjoint set while being able to test set membership in less than \$O(n)\$ time without duplicating the data?

There's an easy way to find the node that contains a given value. You have two data structures: a union-find data structure (that keeps track of which nodes are in the same equivalence class), and a hashtable (that maps from a value to the node(s) that contain that value).

The MakeSet operation adds a new set to the union-find data structure and adds a new entry to the hashtable. The Union operation and Find operations use only the union-find data structure. The "look up a node by value" operation uses the hashtable.

The running time of the MakeSet, Union, and Find operations is as in the underlying union-find data structure. (MakeSet incurs an extra \$O(1)\$ (expected, amortized) time to insert into the hashtable, but this doesn't change the overall running time of MakeSet.) The running time of looking up a node by value is \$O(1)\$ (expected, amortized) time, if you have a good hash function and if each value is only found in a single node.

This doesn't require duplicating the data, if you use pointers in lieu of duplicating: each entry in the hashtable can have a pointer to the node, and each node has a pointer to the value, so values and nodes aren't duplicated. Of course, this does increase the memory usage (by at most a constant factor); nothing is free in life.