Tuesday, July 14, 2009

Hash table using a variety of things for bucket

A hash table stores key-value mapping of data in an array of buckets. The bucket number for a particular key-value pair is decided by a hash of the key modulo the number of buckets. The idea is that if the average number of key-value pairs in buckets is low overall, then we can achieve constant lookup time. One way to organize the bucket is by using a singly linked list. This has the side effect that, if the buckets are overloaded, then lookup time approaches the time of the singly linked list, which is O(n).

An obvious improvement is to make the bucket a binary search tree, perhaps a splay tree (so that item being looked up more often is quicker to find). This cuts down the worst case time of overloading to O(logn), while achieving average case lookup time of O(1).

We can nest hash tables, so a bucket can be implemented by another hash table. In this case, if the load of buckets are small, then we can let two or more buckets initially share a hash table. As buckets grow in size, they split. The size of nested hash tables must be relatively prime, or the inner tables would have high collision rate and won't be effective. The hashing scheme can be organized in the fashion of radix sort where the outer table use one hash function on the most significant bits, and the inner table use another hash function on the least significant bits.

No comments: