Friday, February 19, 2010

sbrk() is not thread safe

The sbrk() system call is a legacy relic of Unix systems before the era of virtual memory, for the purpose that a process may expand or shrink its heap memory in LIFO order. A typical programmer would never use this system call, but it is used to implement malloc() and free(). Modern OSes provide mmap() and munmap() to allocate and free virtual memory, but they still support sbrk() that is also backed by virtual memory. Many memory allocators, most notably the Doug Lea allocator, still prefer sbrk() when possible. When asked the question "why use sbrk() when there is mmap()," the response is often "why not?" We shall settle the account by answering why it is bad to use sbrk().

Out of the allocators that use sbrk(), there are two kinds: ones that recognize the non-contiguity of the sbrk() heap, and ones that do not. The sbrk() heap could become discontinuous because a mmap()'d region or a shared object obstructs the growth of the heap. We'll not pay too much attention to this here, but to focus on the other reason, namely that another part of the program could use sbrk() to request more memory, bypassing malloc(). It would be the result of a performance-critical part of the program implementing its own allocation strategy.

Suppose we write an allocator that, being a good citizen, would try to return memory back to the operating system whenever possible. Since sbrk() can only grow or shrink memory in LIFO order, it would only call sbrk() when it wants to free the last free memory block in the heap that it manages. Obviously, if we allow another part of the program to bypass our heap management and use sbrk() directly, we need to be careful that we can release our free memory block only when it is the top of the heap.

The situation becomes more complicated in multi-threaded programs. A good allocator would use locks and scalable data structure to manage the heap, but it eventually has to acquire or release memory using sbrk(). We can assume that sbrk() system call is atomic, but we will show that sbrk() is still not thread-safe.

Let's say sbrk(0) indicates that the top of the heap currently lies at the end of our free block that we want to release, so the next step would be to call sbrk() with a negative number to shrink the heap. But before we get a chance to do that, we're preempted by another thread that calls sbrk() to expand the heap. Once the other thread is done, we proceed to shrink the heap by the amount we previously determined. We violated the LIFO order, and ended up shrinking the memory the other thread requested.

The problem is that atomicity of sbrk() does not guarantee the consistency of the state of the heap. There is, however, a way to augment the system call with another argument that will make it thread-safe. The sbrk() assumes the following prototype:
void *sbrk(intptr_t increment);
We can regard sbrk() as a function that adds an integer by an increment atomically. Inspired by how compare-and-swap works, we shall modify the function as follows:
void *sbrk_safe(intptr_t increment, void *expect_top);
The new function will only cause the heap to shrink or grow if the expect_top pointer, as seen by the caller of sbrk_safe(), is still the same as the actual heap top. If not, an error value is returned; the caller should refresh its own snapshot of the heap state, and then decide if it wants to try again. This allows each caller of sbrk_safe() to maintain transactional integrity.

We showed that sbrk() in its current form is not thread-safe even if it is atomic. We showed that it is possible to fix by proposing a change of the function definition. However, the ultimate verdict is that it is time to stop using sbrk() altogether. We should all be using mmap() instead.

Update (Feb 22): apparently phkmalloc uses sbrk() as its primary way of requesting memory, and its successor jemalloc still allows the use of sbrk() through MALLOC_DSS option. Jemalloc tries to protect sbrk() with a mutex, acknowledging that it does not protect against third-party use of sbrk().

3 comments:

Unknown said...

Hello,

Found your blog as a top hit for running Chrome on RHEL5, and poked around from there. Most of Amazon runs on RHEL, it would be nice to have Chrome on it, nicer if it would run out of the box.

Interesting blog.

What's your employment situation? We'd love to interview you at Amazon.com.

Likai Liu said...

I have to admit and apologize for the fact the instruction about running Chrome on RHEL5 is a bit terse. I have hacked together a script to automate building all the dependencies from source. Let me know if you want to help me polish it, and perhaps make it useful for all the people running RHEL. You can find my contact information at http://cs.likai.org

john said...

Thank you.