|
|
So far, this is text book material prior art.
In terms of cache efficiency, separate chaining has the disadvantage that list nodes may be scattered, so accessing the list can cause many cache misses. Open addressing is more cache friendly because the key and values are stored in a contiguous array. The question is: can we do away without resizing like separate chaining, but enjoy the cache efficiency of open addressing?
Yes we can. Here I introduce the recursively extensible open addressing hash table.
A recursively extensible open addressing hash table keeps the key/value in the table, but also reserves a pointer to an extension subtable per entry. To resolve conflict, it would first probe within the same table for an available slot but limited to k tries with a small k. If the search fails, then it would descend to the extension subtable at the original hashed index, creating the subtable only if necessary. It would use an orthogonal hash function to compute the index within this subtable, repeat the probing, and descend if necessary.
An orthogonal hash function is linearly independent (in the linear algebra sense) to the hash functions used in parent tables. An easy way to compute the orthogonal hash is to consider a universal hash function producing a 32-bit unsigned integer output. The first level hash table has 256 entries, and the index is based on the value in the 8 least significant bits. The second level tables also have 256 entries, with index drawn from the next 8 bits, and so on for the third and fourth levels.
As for the probing policy, we could use any policy like any open addressing hash table would, but linear probing has the best cache locality, and it is sufficient to achieve high load factor without compromising lookup performance. That is because the subtables all enjoy O(1) access as well.