Tuesday, January 19, 2010

Firefox Memory Usage

It's been a while since pavlov last visited optimizing memory usage in Firefox, which resulted in a switch to jemalloc. Since then, I decided to take a closer look at memory allocation. Using dtrace to attach to an existing running Firefox process, I traced the arguments for each calls of malloc() and calloc(). The trace file was observed over a day of Firefox usage, resulting in a 1.7GB trace file. The histogram derived from the trace looks like this:


The x axis is the object size, and the y axis is the total number of requests of that size. This is admittedly not very interesting. All it says is that there is a lot of small-sized allocation, but the different sizes are pretty spread out. There is occasionally very large requests.

I decided to aggregate object sizes into size classes in the same way jemalloc and dlmalloc does it: there is one class for each sizes <= 512 bytes in 8 byte increments, and sizes above that are grouped to their power of 2 ceilings. There is a bin for each 0, 8, 16, 24, 32, ..., 504, 512 size classes, and 513--1024, 1025--2048, 2049--4096, ... size classes. The resulting aggregated histogram, plotted with x axis on the log scale, looks like this.


It appears that both allocators would probably perform well for most sizes, since the sizes 512 bytes and under are managed using lists with O(1) insertion and removal. However, the 1024--2048 class may be a bit of a concern. Here is the linear scaled histogram zoomed in to this range:


Although it can't be seen easily from the figure, there are two significant peaks at exact sizes 1040 and 1047 bytes.

Given this information, is jemalloc a good choice? Probably not. That's because jemalloc rounds object sizes up to a power of 2 number for objects between 512 bytes and 1MB, so objects that are 1040 and 1047 bytes will have 2048 bytes allocated for them. That is a large amount of internal fragmentation.

The Doug Lea allocator dlmalloc, on the other hand, does not round sizes up like that. The power of 2 binning for large objects is only for the purpose of locating the object size class. Each size class is a doubly linked list of objects of the same size, with O(1) insertion and removal.

However, this seems to contradict another finding by pavlov, namely that jemalloc gives better fragmentation results than the default glibc malloc, which is based on dlmalloc. The explanation is that dlmalloc can be subject to external fragmentation when it has a mix of small and large objects on the heap. Jemalloc's use of array-based segregated heap for small objects and buddy-system like heap for large objects trades off external fragmentation for internal fragmentation.

I have a hunch that combining dlmalloc and jemalloc strategies might yield further improvement. Namely, for small sizes, use array-based segregated heap like jemalloc. For large sizes, use coalesceable heap like dlmalloc. This avoids the mixing of small and large objects on the same heap, so external fragmentation could be reduced with no impact on internal fragmentation.

Sunday, January 3, 2010

Solving the Gaiji Problem Portably

Unicode codifies many Chinese ideographic characters, but ideographic characters are really doodles in nature, and compositions can be fairly arbitrary. Over the years, there have been some uncommon variants of characters that are used for place or person's names that don't have a unicode code point. These are called Gaiji (in Japanese) or Zaozi (in Chinese). Government and other agencies need to have a system to deal with them.

Traditionally, one would compose his own glyph and allocate a code point from the private use area (PUA) in Unicode, but everyone assigns code points to glyphs and meanings differently. Even if each agency employs a centralized repository to keep track of agency-wide private use area code points (they would have to), they would still have problem exchanging information with another agency or an outside party, who cannot make sense of the private use area.

Most ideographic characters can be decomposed, and Gaijis are no exception. The decomposed glyphlets are actual existing characters. Using some Unicode codified characters for example (they were formerly Gaijis in BIG-5 encoding but now have an assignment in Unicode), 堃 decomposes to 方方土, and 喆 decomposes to 吉吉.

For the purpose of representing Gaiji in Unicode, I advocate the following solution based on certain Unicode control characters.
  • Since Gaiji does not have Unicode representation, it should use the U+FFFD code point (replacement character) to signify this fact.
  • However, the replacement character can be annotated with Ruby Text U+FFF9 (interlinear annotation anchor), U+FFFA (interlinear annotation separator), and U+FFFB (interlinear annotation terminator) to describe how the gaiji can be decomposed.
A typical Gaiji sequence would start with U+FFF9, U+FFFD, U+FFFA, optionally followed by an ideographic description character (IDC), a sequence of decomposed glyphs, and U+FFFB. Essentially, we're marking the Gaiji decomposition as Ruby Text. Software that doesn't understand this scheme can still scan and index the decomposed glyphs. And as long as there is a standard way to normalize decomposition, the gaiji unicode sequence is interchangeable. The IDC provides a hint for the renderer to display the gaiji graphically using existing font.