Sunday, August 5, 2012

A critique of non-blocking deque due to Arora, Blumofe, and Plaxton

Arora, Blumofe and Plaxton presented a non-blocking implementation of work-stealing deque in the paper Thread scheduling for multiprogram multiprocessors (SPAA 1998). Before that, Blumofe worked on the randomized work stealing algorithm for Cilk, with Charles Leiserson.

Here I wish to make a few notes on the non-blocking deque implementation in the SPAA 1998 paper. A deque is described by the following preamble.
// An abstract, opaque type describing a unit of work.
struct Thread;

// A finite but arbitrary large capacity.
#define DEQ_SIZE ...

// An array for storing items in the deque.
Thread *deq[DEQ_SIZE];

// A top pointer that is time-stamped to avoid ABA problem.
struct Age {
  size_t top;
  size_t tag;

volatile Age age;

// Assume there is a compare and swap for struct Age.  Most
// likely this is going to use a double word compare and swap.
// If a machine does not support that, adjust the struct Age
// above to divide a machine word into two bit fields, and use
// single-word compare and swap.
// Modifies newAge to the value of oldAge if the operation
// succeeds.
void cas(volatile Age& mem, Age oldAge, Age& newAge);

// Bottom pointer is manipulated solely by the worker who
// owns the deque, never by thief.
volatile size_t bot;
All of the code below is copied verbatim from the paper; it is missing necessary type annotation, which must be added before the code could compile, but adding type annotation is straightforward.

The owner puts work into the deque using pushBottom() as defined below.
void pushBottom(Thread *thr)
  localBot = bot;
  deq[localBot] = thr;
  bot = localBot;
Here we note that the pushBottom() function doesn't actually respect the capacity of the deque. The deque might overflow without notice. Bad.

A thief steals work using popTop() as defined below. The difference between "bottom" and "top" only serves to illustrate that thief is stealing from opposite end of the deque than the owner. This allows the owner to go on its own merry ways most of the times, until contention with the thief happens when the queue is nearly empty.
Thread *popTop()
  oldAge = age;
  localBot = bot;
  if (localBot <=
    return NULL;
  thr = deq[];
  newAge = oldAge;;
  cas(age, oldAge, newAge);
  if (oldAge == newAge)
    return thr;
  return ABORT;
The assumption is that if the thief manages to increment age, then it is assumed to claim the item at deq[]. If the increment fails due to contention with the owner (queue becomes empty) or with a fellow thief (queue is not empty yet), the current thief gives up. In the latter case, an alternative implementation could decide to retry from the beginning rather than abort. This causes a thief to potentially wait for an unbounded time, but system-wise at least one thief is making progress. The owner removes an item from the deque using popBottom() as follows.
Thread *popBottom()
  localBot = bot;
  if (localBot == 0)  // #a
    return NULL;
  bot = localBot;  // #c
  thr = deq[localBot];
  oldAge = age;
  if (localBot >  // #b
    return thr;
  bot = 0; = 0;
  newAge.tag = oldAge.tag + 1;
  if (localBot == {
    cas(age, oldAge, newAge);
    if (oldAge == newAge)
      return thr;
  age = newAge;
  return NULL;
The deque is empty when either bot is 0 [#a] or if the bottom meets the top [#b]. There is a race condition between the owner and thief. Suppose at the beginning, bot == + 2 (this is necessary for the conditon at [#b] to hold). The owner got momentarily preempted by the operating system right before [#c], and meanwhile two thieves come along and both succeed. The owner wakes up and happily returns the "thr" value that the last thief already took away. This causes a task to be scheduled twice.

Another peculiar fact about the deque is that, suppose we impose an upper bound for bot so that pushBottom() will not go out of bound. The deque might contain far fewer items than its capacity allows, if both bot and top are very close to the upper bound but the deque is not empty. This means that the capacity of deque is only reclaimed when the deque is completely empty. Again, not a very good design in practice.