Monday, September 3, 2012

Mistake in my RC4 implementation

Jim Wilcoxson approached me two days ago regarding the skewed distribution I observed in RC4. He sent me a snippet from SQLite's implementation of RC4 and showed that it did not suffer the non-uniform result I saw.

After digging out my old code, I found two mistakes I made. One in the key scheduling algorithm:
  void initialize(const char *begin, const char *end) throw() {
    for (size_t i = 0; i < 256; ++i)
      s_[i] = i;

    const char *key = begin;
    uint8_t j = 0;

    for (size_t i = 0; i < 256; ++i, ++key) {
      if (key == end) key = begin;
      j = j + s_[i] + *key;
      std::swap(s_[i], s_[j]);
    }
  }
In the original code, I had forgotten to ++key, so the key scheduling is always done on the first byte of the key only. The addition for the fix is highlighted in yellow.

Another mistake is an index out of bounds error in the generation of bytes.
  uint8_t next() throw() {
    i_ = i_ + 1;
    j_ = j_ + s_[i_];
    std::swap(s_[i_], s_[j_]);
    return s_[s_[i_] + s_[j_]];
  }
There is an array index out of bounds error in the return line. The reason is that in C/C++ compiler, the intermediate result of an expression, if it's an int, does not inherit the int size of the sub-expressions, but instead is word sized. Since both s_[i_] and s_[j_] are (0...255), half of the times the result is in (256...511) which accesses some memory beyond s_[]. This explains why I get a skew where most byte value tallies are halved.

The fix is to replace the return line as follows:
    return s_[(uint8_t) (s_[i_] + s_[j_])];
Where an explicit cast to uint8_t solves the index out of bounds problem.

The suggestion I gave, to add ^ j, only obscures the problem.

Fortunately my code has never been used in a production system. It was written quickly to give me a PRNG for a memory usage simulator I wrote last year for testing a memory allocator where the PRNG (1) must not use memory allocation itself, (2) must not use locking, (3) is reasonably fast and easy to implement. The implementation satisfied all my original requirements but was not RC4.

No comments: