## Thursday, June 10, 2010

### Finding n Consecutive Bits of Ones in a Bit Vector in O(lg n) Time

We want to find out whether a bit vector X of length m has n consecutive bits of ones, as well as all the positions where we can find it.

For example, the following bit vector:
(msb) 11111111 01111111 00111111 00011111 (lsb)
Has 5 consecutive ones starting at bit 0, 6 consecutive ones at bit 8, 7 consecutive ones at bit 16, and 8 consecutive ones at bit 24. In addition, we count 4 consecutive ones at bit 1, 3 consecutive ones at bit 2, and so on. The overlaps are also considered. Therefore, if we were to find all positions with 6 consecutive ones, we could generate a positional bit mask like this:
(msb) 00000111 00000011 00000001 00000000 (lsb)
The result indicates that we can find 6 consecutive ones in the original bit vector if we start at bits 8, 16, 17, 24, 25, 26.

Of course, if we can only access one bit at a time, the best running time we can do is O(mn). Most computers provide bitwise operators to compute logical AND and OR on the matching bits of two bit vectors simultaneously, as well as bit shift operators. This, as we show, will dramatically reduce our running time.

Our first attempt is to simplify the problem of finding two consecutive ones, that is, bit i should be set if both bit i and bit i + 1 are ones. This is easy. We shift X to the right by one bit, and AND it against the original unshifted X. In C, this is written as X & (X >> 1). We can generalize the first attempt to find n consecutive ones by doing X & (X >> 1) & (X >> 2) & ... & (X >> (n - 1)). This naive algorithm takes O(n) time.

The part that gets really interesting is when we observe that 4 consecutive ones can be detected by 2 consecutive ones followed by another 2 consecutive ones. In the original example,
(msb) 11111111 01111111 00111111 00011111 (lsb) [n = 1] Y0 = X
(msb) 01111111 00111111 00011111 00001111 (lsb) [n = 2] Y1 = Y0 & (Y0 >> 1)
(msb) 00011111 00001111 00000111 00000011 (lsb) [n = 4] Y2 = Y1 & (Y1 >> 2)
(msb) 00000001 00000000 00000000 00000000 (lsb) [n = 8] Y3 = Y3 & (Y2 >> 4)

The reason why this works is that in computing Y2, we use the fact that Y1 has gaps of 2 consecutive zeroes to shorten our span of ones, and in Y3 uses the fact that Y2 has gaps of 4 consecutive zeroes, and so on. This is another type of SWAR algorithm.

We will keep doing this until we reach a power of 2 that is <= n. Then the last step we just apply a shift to truncate off the remainder of the bits.

In the original example, suppose we want to find n = 7 consecutive ones.
(msb) 11111111 01111111 00111111 00011111 (lsb) [n = 1] Y0 = X
(msb) 01111111 00111111 00011111 00001111 (lsb) [n = 2] Y1 = Y0 & (Y0 >> 1)
(msb) 00011111 00001111 00000111 00000011 (lsb) [n = 4] Y2 = Y1 & (Y1 >> 2)
(msb) 00000011 00000001 00000000 00000000 (lsb) [n = 7] R = Y2 & (Y2 >> 3)

The last step R, instead of shifting by 4, we only need to shift by 3.

The final algorithm is this:
typedef unsigned int bitvec_t;

bitvec_t consecutive_mask(bitvec_t x, unsigned int n) {
unsigned int stop = prev_pow2(n);
for (unsigned int i = 1; i < stop; i <<= 1)
x &= (x >> i);
if (stop < n)
x &= (x >> (n - stop));
return x;
}

The for-loop runs in O(lg n) time. We still need to show that prev_pow2(), which computes the power of two that is <= n, also runs efficiently. Again, this can be done using a SWAR algorithm.
unsigned int prev_pow2(unsigned int n) {
return (fold_bits(n) >> 1) + 1;
}

where fold_bits() is the SWAR algorithm defined as follows:
unsigned int fold_bits(unsigned int x) {
x |= x >> 1;
x |= x >> 2;   /* last for 4-bit */
x |= x >> 4;   /* last for 8-bit */
x |= x >> 8;   /* last for 16-bit */
x |= x >> 16;  /* last for 32-bit */
x |= x >> 32;  /* last for 64-bit */
return x;
}

The remark "last for k-bit" indicates that if x only has k bits, then that line is the last operation for the fold_bits algorithm. Since our n only has 5 or 6 bits, we only need to do up to "last for 8-bit." It turns out that this part is also O(lg n).

Here we presented the problem of finding the positions of all consecutive bits of ones in a bit vector, and an algorithm to do it in O(lg n) + O(lg n) = O(lg n) time. An application for this algorithm is dynamic storage allocation. Suppose I have each bit in the bit vector to indicate the availability of a storage cell, then this algorithm allows me to find places where I can reserve n consecutive storage cells in O(lg n) time.