Thursday, September 4, 2008

Hard to find bug

A while back, I implemented an array based queue. For the purpose of lock-free wait-free operations, the two indices for front and rear are strictly increasing, and the corresponding index into the array is computed by taking the modulo of array length. It is designed that way so when two threads modify the front or rear pointers, they can detect when they're in each other's way (using compare and swap) and help along the other thread to make progress, rather than just spinning and waiting.

The problem is this. In a practical computer, these front and rear pointers are represented in finite integers (most likely 32-bit), and they will wrap around after a long time. If the length does not divide 232, then 232 modulo array length should be non-zero, while the wrap around makes the pointers 0, and 0 modulo length is zero. This causes discontinuity in array access, and it only happens after manipulating the queue for a very long time.

The fix is to force array length to be a power of 2, which divides 232, so 232 modulo length of the array is 0 just like the integer wrap-around makes it so.

No comments: