0x55555555
, 0x33333333
, 0x0f0f0f0f
, etc. are used to mask out bits in specific locations. These bit masks can be generated by dividing the maximum integer 0xffffffff
(or simply ~0u
) against Fermat numbers 3, 5, 17, 257, 65537, etc.
Hex | Binary | Expression |
0x5555... | 0101010101010101... | ~0u / 3 |
0x3333... | 0011001100110011... | ~0u / 5 |
0x0f0f... | 0000111100001111... | ~0u / 17 |
0x00ff... | 0000000011111111... | ~0u / 257 |
and so on... |
It's not a coincidence that integers with all one bits set can be divided cleanly by a Fermat number, and this works with 8-bit, 16-bit, 32-bit, 64-bit, and beyond. To understand why, a k-bit all-one integer is 2k−1. In particular, k=2n in our case. We can factor 22n−1 using the formula a2−b2=(a+b)(a−b) we learned in grade school, here using the special case b=1, so a2−1=(a+1)(a−1).
22n−1=(22n−1+1)(22n−1−1)=(22n−1+1)(22n−2+1)(22n−2−1)=(22n−1+1)(22n−2+1)(22n−3+1)(22n−3−1)=…
Conveniently, the last term 220−1 is just 1, so we can write the product in terms of Fi=22i+1 for the i-th Fermat number,
22n−1=∏i=0…n−122i+1=∏i=0…n−1Fi
For example,
k | 2k−1 | Expression |
2 | 3 | 3 |
4 | 15 | 3×5 |
8 | 255 | 3×5×17 |
16 | 65535 | 3×5×17×257 |
32 | 4294967295 | 3×5×17×257×65537 |
and so on... |
So how are Fermat numbers related to bit mask construction? It turns out that if you write 3, 5, 17, 257, 65537 in binary, you get
11
, 101
, 1 0001
, 1 0000 0001
, and 1 0000 0000 0000 0000 0000 0000 0000 0001
respectively. If you multiply 101
(5) and 1 0001
(17), you get alternating patterns of 0101 0101
(85). If you multiply this by 11
(3), you get an 8-bit all-one integer 1111 1111
(255). Conversely, you can take a pattern "out of circulation" by dividing the all-one integer by a Fermat number which is one of its factors.
Appendix: Counting the number of one bits using C++ template meta-programming
We start with the straightforward bit vector algorithm:x = ((x >> 1) & 0x55555555) + (x & 0x55555555); x = ((x >> 2) & 0x33333333) + (x & 0x33333333); x = ((x >> 4) & 0x0f0f0f0f) + (x & 0x0f0f0f0f); // etc.which can be represented by the following for-loop:
for (int n = 1; n < sizeof(Tp) * 8; n *= 2) Tp mask = ~0 / ((1 << n) + 1)); x = ((x >> n) & mask) + (x & mask); }In the template meta-programming version, the template
__unroll_shift_mask_add
essentially unrolls into the for-loop above, but in a manner that is agnostic to the actual integer type.
template<unsigned int n, typename Tp> struct __unroll_shift_mask_add { static Tp compute(Tp x) { Tp y = __unroll_shift_mask_add<n / 2, Tp>::compute(x); Tp mask = ~Tp(0) / ((Tp(1) << n) + 1); return ((y >> n) & mask) + (y & mask); } }; template<typename Tp> struct __unroll_shift_mask_add<0u, Tp> { static Tp compute(Tp x) { return x; } }; template<typename Tp> Tp one_bits(Tp x) { return __unroll_shift_mask_add<sizeof(Tp) * 8 / 2, Tp>::compute(x); }