Friday, September 14, 2012

Design of a Logging System

Here is design of a logging system that provides a convenient way to trace the execution of a long running program with arbitrary format strings and data, serialized to a compact binary file.

It needs to store the following information per emitted data record:
  • executable image magic which identifies the logdef table used.
  • __FILE__ (via logdef table)
  • __LINE__
  • backtrace() (an array of machine words, looked up via logdef table)
  • pthread_self()
  • log level
  • current time from gettimeofday()
  • format string (printf formatted, via logdef table)
  • format string arguments (all types accepted by printf, no logdef lookup)
The log will contain the following record types:
  • Preamble: stores the full path to the executable image and its content fingerprint (size + {md5 or sha1}) so that corrupted logdefs can be reconstructed. Also stores the magic derived from the fingerprint.
  • String logdef: a "log definition" for magic × uintptr_t -> string, for __FILE__ and format string lookup.
  • Backtrace logdef: a "log definition" for magic × uintptr_t -> {dli_fname: string, dli_fbase: uintptr_t, dli_sname: string, dli_saddr: uintptr_t} as returned by dladdr(), for backtrace lookup.
  • Data record, as described above.
Log files may be arbitrarily concatenated. In that case, the logdef table id which is the same as the executable image fingerprint magic will allow each data record to be interpreted using a different set of logdefs. The preambles may be corrupted also. If both the preamble and logdefs are corrupted, a data record may have missing interpretations.

Logdefs are emitted lazily. The logging system in process maintains a smaller lookup table that tracks whether a logdef has already been emitted. The logdef should immediately precede the data record that references it.

The log serialization format is based on msgpack. A log file consists of frames. Each frame is the sequence:
  • 0xc1 | crc32 | 0xc1 | raw_record
Note that 0xc1 is reserved in msgpack, the value between nil (0xc0) and boolean false (0xc2).

The crc32 is the CRC32 for the bytes inside raw_record, as msgpack object uint32_t (0xce).  A raw_record is a msgpack raw object, either fixraw (0xa0—0xbf), raw16 (0xda) or raw32 (0xdb) that frames a record. This structure, along with the reserved 0xc1, can be used to re-sync to frames in case the log file is corrupted.

Each record is simply two msgpack objects concatenated.
  • tag | payload
Tag is a simple positive fixnum indicating the record type, preamble (0), string logdef (1), backtrace logdef (2), data (127). The payload depends on the record type. Payloads are maps (fixmap, map16, map32) with field_tag to field_value mapping. Fields are specific to record type, but the payload being a map allows a field to be optional, in a similar vein to protocol buffer messages.

Thursday, September 6, 2012

Apple Mac OS X keyboard viewer got stuck maximized

My mom likes pushing buttons, but today she pushed the wrong one. She often uses keyboard viewer to assist her with input methods because she couldn't remember the key mappings. Once the keyboard viewer windows is shown, you can click on the green button to maximize the keyboard viewer, and clicking on the green button again should restore its original size. However, if you drag the keyboard viewer in its maximized state just slightly off the screen, then the green button no longer restores the keyboard viewer back to its original size. The maximized keyboard viewer looks like this and occupies half of the screen.
This happens on Mac OS X Lion (10.7). Not sure if this happens with other versions also.

The solution is to first close the keyboard viewer, then use Terminal (under Applications/Utilities) and enter the following command at the prompt:
defaults delete
The keyboard viewer should be restored to its original state next time you open it.

Another issue my mom was having was that when she tried to enter Chinese, certain characters cause programs to crash (also known as 地雷字, di lei zi, meaning “landmine characters”). It turns out the culprit is the "Use symbol and text substitution" option. It can be turned off under Preferences, Language & Text, Text. Text substitution erroneously matches some input sequences prematurely from the input method and causes invalid characters to be entered into the program.

Monday, September 3, 2012

Mistake in my RC4 implementation

Jim Wilcoxson approached me two days ago regarding the skewed distribution I observed in RC4. He sent me a snippet from SQLite's implementation of RC4 and showed that it did not suffer the non-uniform result I saw.

After digging out my old code, I found two mistakes I made. One in the key scheduling algorithm:
  void initialize(const char *begin, const char *end) throw() {
    for (size_t i = 0; i < 256; ++i)
      s_[i] = i;

    const char *key = begin;
    uint8_t j = 0;

    for (size_t i = 0; i < 256; ++i, ++key) {
      if (key == end) key = begin;
      j = j + s_[i] + *key;
      std::swap(s_[i], s_[j]);
In the original code, I had forgotten to ++key, so the key scheduling is always done on the first byte of the key only. The addition for the fix is highlighted in yellow.

Another mistake is an index out of bounds error in the generation of bytes.
  uint8_t next() throw() {
    i_ = i_ + 1;
    j_ = j_ + s_[i_];
    std::swap(s_[i_], s_[j_]);
    return s_[s_[i_] + s_[j_]];
There is an array index out of bounds error in the return line. The reason is that in C/C++ compiler, the intermediate result of an expression, if it's an int, does not inherit the int size of the sub-expressions, but instead is word sized. Since both s_[i_] and s_[j_] are (0...255), half of the times the result is in (256...511) which accesses some memory beyond s_[]. This explains why I get a skew where most byte value tallies are halved.

The fix is to replace the return line as follows:
    return s_[(uint8_t) (s_[i_] + s_[j_])];
Where an explicit cast to uint8_t solves the index out of bounds problem.

The suggestion I gave, to add ^ j, only obscures the problem.

Fortunately my code has never been used in a production system. It was written quickly to give me a PRNG for a memory usage simulator I wrote last year for testing a memory allocator where the PRNG (1) must not use memory allocation itself, (2) must not use locking, (3) is reasonably fast and easy to implement. The implementation satisfied all my original requirements but was not RC4.