Friday, May 7, 2010

Multi-threaded allocator locking

Suppose we have an allocator that employs a headerless object layout strategy, like Streamflow. The free(p) operation needs a way to tell if p is managed by any of the subheaps, and if so, which subheap manages p. In essence,
void free(void *p) {
  SubHeap *h = GetSubHeap(p);
  if (h)
    h->Free(p);
}
We need a way to quickly lookup the subheap that manages a pointer. The mechanism used in Streamflow is to have a byte vector, each one corresponding to a page in memory denoting whether the subheap is present, and if so, which kind (I imagine subheaps would be in power of two number of pages, naturally aligned to its size). This mechanism is attributed to the Big Bag of Pages (BIBOP) due to Guy Steele, Jr.

On a 32-bit machine with 4KB page size, the byte vector would be 1MB. On a 64-bit machine, it will be a whopping 4096TB unless we use much bigger page size, or that the table would have to be sparse, such as using a multi-level hash table, or using some kind of a binary search tree. Both impose challenge in scalability in a multi-threaded setting because concurrent lookup and update are non-trivial.

Inspired by how Translate and Lookaside Buffer (TLB) works, I postulates that most of the lookup could be localized to the thread's own lookaside table. It is a linearly probed hash table, keyed by the lookup address, with a small probe distance (say, 4). The idea is that if we can't find the entry in four iterations of a search, then we might as well appeal to the global data structure. Hash table insertion may overwrite existing data. The policy to balance probe distance, eviction policy, and performance would be subject to further research.

So we will have GetSubHeap(p) implemented like this:
SubHeap *GetSubHeap(void *p) {
  SubHeap *h;
  if (h = LocalSubHeapLookup(p))
    return h;
  if (h = GlobalSubHeapLookup(p)) {
    LocalSubHeapInsert(p, h);
    return h;
  }
  return NULL;
}
The LocalSubHeapLookup() and LocalSubHeapInsert() operate on thread local data structure, so they do not need any synchronization. GlobalSubHeapLookup(), on the other hand, must acquire a read lock.

What might be of special interest is GlobalSubHeapRemove(). What if a cached entry of subheap is being removed? Ideally, the removal would only happen if the subheap is empty, which means the program should no longer have references to any objects in the subheap. The entry in the lookaside table would be left unused until it is evicted. However, many programs have double-free problems. This would cause free(p') to find a subheap from p' that is already removed, and call SubHeap::Free(p'). Therefore, the precise problem depends on what the function does. Another problem is if the subheap is removed, but another subheap is constructed on that address. The double-free problem would corrupt the subheap, again, depending on how resilient SubHeap::Free(p') handles corrupted pointers (the subheap might have a different object layout this time). Even if we keep the thread-local lookaside table up to date with the global mapping, there may simply be no good way to handle double-free gracefully.

No comments: