Friday, July 24, 2009

Splay trees with duplicate items

I've implemented a splay tree, and I'd like it to store duplicate items, which are key-value mappings, as follows: if a key to be inserted already exists, then the new binding shadows the old one. If a key is to be removed, then the old binding is revealed. In other words, the inserting and removing the same key reveals the value in LIFO, or stack, order.
The most obvious solution is to have each node be a pointer to a singly linked list, with the cons/uncons operation trivially obeying stack order. But we could embed the singly linked list in a simple binary search tree if we relaxed the criteria a bit, for example making the left child <= parent as opposed to left child < parent. This effectively makes duplicate nodes a chain of singly linked list down the left path. The idea is that we always reach the parent first, which is the most recently inserted item. And if the parent is removed, it reveals the second next item of the same key.
The complication with Splay tree is that it could rotate the tree, which breaks the chain of duplicate nodes. Sometimes it is as bad as splitting the chain to left and right children to a node != the key of the chain, invalidating the binary search tree invariant. I figured out that, in the zig-zig and zig-zag cases, if the grand-parent node has the same key as the parent node, then it is possible to come up with special case rotation that preserve the chain. However, the terminal case actually needs to descend down the whole chain of duplicate notes. At this moment I gave up.
Also, another problem with embedding singly linked list in a tree is that it will increase the depth of the tree, hurting the overall running time. In this case, it seems that the simple approach, making each node a pointer to a singly linked list, is better. It only took me 5 minutes to change the implementation this way and it worked.
Update (9/15): Wikipedia claims that identical keys in a splay tree are preserved in its order like stable sort. It suffices to find the leftmost or rightmost node of the identical key. I haven't tried this.
Update (9/16): Here is the Wikipedia edit that added the aforementioned claim about stable sort. However, I don't think retrieving duplicate item is as simple as finding the leftmost or rightmost node. Even if insertion is stable, the duplicate nodes may not be continuous (e.g. the balanced binary search tree for 1, 2, 3, 3, 3, 4, 5, where the 3's are separated by nodes 2 and 4). Looking for the "largest" duplicate complicates the search by a great deal because you will have to probe either the minimal of the right child, or the maximal of the left child, and may have to back-track. Curiously, Splaysort did not mention how they deal with duplicates.

1 comment:

Hal Hildebrand said...

Thanks for this post. The wikipedia article is very misleading. At the very least they should require a link to something that actually demonstrates this is possible, not simply declare that it is possible.