Life of a Computer Scientist: September 2009

Tuesday, September 22, 2009

Google Chrome (Chromium) for RHEL (CentOS) 5

For a long time, I've kept a Windows machine for Remote Desktop access that I use just so that I can use Google Chrome. The reason is that editing Google Sites pages, for example, drags Firefox 3.5 and Opera 10 to a slow crawl. And Firefox 3.5 on Linux (since beta2) crashes my X11 server because the video driver (xorg-intel) is old and buggy. But I've long wished to abandon Windows altogether. I share the Windows machine with another graduate student, and the Windows Server 2003 license for that computer restricts two remote desktop connections. This is surely absurd for me as a Linux user because I've along accustomed to unlimited number of connections (subject only to resource availability as opposed to artificial license restriction). Granted, I could have used VNC, but it's not as responsive.

It is currently not my best interest to figure out how to compile Chrome from scratch (they appear to be using some gclient custom build tools). However, the precompiled binary snapshots require more recent versions of FreeType, Gtk+, and NSS (network security service, as part of Mozilla) than that available on BU Linux (which is a RHEL/CentOS 5 derivative).

Compiling Gtk+ is a nightmare because of the sheer number of dependencies. Here I'm listing the packages I need. The listing format is "target: dependencies"

gtk+: atk glib cairo pango jasper fontconfig
pango: cairo freetype glib
cairo: libpng freetype pixman fontconfig
fontconfig: freetype
atk: glib
glib:
libpng:
freetype:
pixman:
jasper:

The order listed above is in reverse topological order. I just put them in a Makefile and let make figure out the order for me. Fortunately, these packages are standard GNU autoconf/automake based. You will have to set export variables as follows:

PREFIX := ... # you pick a convenient location
export CFLAGS := ... # you pick the compilation options, e.g. -march=i686 -O3
export CPPFLAGS := -I$(PREFIX)/include
export LDFLAGS := -L$(PREFIX)/lib
export PKG_CONFIG_LIBDIR := $(PREFIX)/lib/pkgconfig
export PATH := $(PREFIX)/bin:$(PATH)
export LD_LIBRARY_PATH := $(PREFIX)/lib

You will have to run configure with --prefix=$(PREFIX), and then run make and make install inside your Makefile.

I also wrote a script to essentially do the following. This will be left as an exercise to the reader.

Given a target name like gtk+, locate the source package tar.gz and unpack it.
Make a new object files directory and run the configure script there as the current working directory. If the build fails, you can just remove the object directory to get a clean slate, instead of needing to unpack the source files again.
Do the traditional make and make install.
Touch the target as a file, so make doesn't have to repeat building and installing a target if it already succeeded.

Each rule in the Makefile would look like this:

# Assuming the build script is called build.sh in the current directory.
BUILD := ./build.sh

gtk+: export LDFLAGS := -lm  # for jasper
gtk+: atk glib cairo pango jasper fontconfig
$(BUILD) $@

On the other hand, NSS has its own build instruction, so I did it manually. The resulting nss shared objects must be symlinked in a particular way:

Add ".1d" suffix: libnss3, libnssutil3, libsmime3, libssl3.
Add ".0d" suffix: libplds4, libplc4, libnspr4.

I imagine that's Ubuntu/Debian convention.

I also compiled my own gcc 4.3/4.4, and without a recent gcc and libstdc++.so.6, you will get a dynamic linking error with the symbol GLIBCXX_X_Y_Z.

Tuesday, September 15, 2009

Benchmark Using High Resolution Clock

POSIX 1993 real time extension defines an interface to obtain clock tick down to nanosecond precision, using clock_gettime(). One way to implement this is to use Time Stamp Counter (TSC) that comes with the CPU, which is incremented by one per CPU clock cycle. On a modern gigahertz CPU, the TSC increments several times per nanosecond, so it can provide sub-nanosecond resolution.

Then, there comes CPU with power saving mode, which reduces the CPU frequency when underutilized. Some older CPUs increment TSC by one per cycle regardless of the frequency, so each TSC increment is not uniform (though still monotonic). For these CPUs, we can get a reliable count of instruction cycles, but we cannot use the TSC to track wall-clock time.

Later CPUs increment TSC by one according to the maximum CPU frequency, so the TSC increment takes uniform interval. However, when benchmarking code, the TSC no longer measures actual clock cycle. Code that runs in reduced CPU frequency will take more TSC counts to execute. A machine to be used for benchmarking purpose should turn off power saving mode. In particular, a laptop (having more stringent power requirement) is probably not well-suited for benchmarking.

There is also the issue of multi-processor machines. Each CPU may have slightly different frequency, and they're started at slightly different times at boot up. The TSC of each CPUs cannot be easily correlated. Even if a benchmark involves only a single process, the operating system might still migrate the process to another CPU, causing the TSC readout to be non-deterministic. On a multi-process machine, benchmark should be run with CPU affinity fixed on one CPU, using sched_setaffinity(). CPU affinity is desired also for cache and TLB reasons.

I do not currently know of a reliable way to benchmark multi-threaded program down to nanosecond precision. Here are some problems I could think of.

sched_setaffinity() works on a pid_t (i.e. the whole process). In general, pthread_t and pid_t are not interchangeable.
If we want to use one CPU as the master clock source, the intercommunication and synchronization overhead will be non-negligible. It should be later verified, but I'm guessing that the resolution would be ~100s nanoseconds.

Saturday, September 12, 2009

Compiling LyX with gcc 4.4

I recently upgraded to gcc 4.4.1, and today I found that I couldn't compile LyX without fixing the source code. The reason is because gcc 4.4 has a stricter interpretation of the C standard when it comes to preprocessor syntax. The #if ... #elif ... #endif block is now either all processed or all skipped. In other words, #elif does not behave like #else #if ... #endif anymore because the conditional expression of #elif no longer enjoys deferred computation. This is documented in gcc bug #36453.

LyX failed to compile because it distributed and included boost/mpl as part of the source code. A patch for boost/mpl is readily available. I downloaded the most recent patch (0001-boost.mpl-gcc-4.4-fixes.patch), saved it under $(SRCDIR)/boost (suppose $(SRCDIR) is the source directory for LyX, e.g. lyx-1.6.4), and ran from $(SRCDIR)/boost the command: patch -p3 < 0001-boost.mpl-gcc-4.4-fixes.patch

LyX should now compile correctly under gcc 4.4

Sunday, September 6, 2009

Abuse of read/write lock

A more fine-grained alternative to mutex lock is the read/write lock (rwlock). Mutex disallows two threads to access the same variable at the same time regardless whether the access is a read or write. With rwlock, the basic principle is simple: two readers can read from a variable (memory location) simultaneously without stepping on each other's toes. However, two writers cannot both write to the same variable, nor can one writer and one reader try to access the variable at the same time; both lead to inconsistent results.

There is one useful way to correctly abuse rwlock. Before we introduce the idea, let's talk a bit about lock-free stack implementation using singly linked list. Using compare and swap instruction, stack push can be implemented in the lock-free (and wait-free) manner. The compare and swap instruction provides a sufficient one-word transactional memory mechanism to allow changes made by another thread to be detected. This works because the node to be pushed to the stack is supposedly private. But stack pop suffers the complication of the ABA problem because the node to be popped is shared, so while thread 1 is in the middle of popping node A—let's say its successor is X at the moment—thread 2 pops node A, pushes node B, and pushes node A back. Now, A's successor is B instead of X. Thread 1 may mistakenly put reference to X as the top of the stack when it finishes popping node A.

With the singly linked list stack with push and pop implemented using compare and swap, we have the following access granting relationship. Two threads can safely push at the same time (guarded by compare and swap), but no two threads may pop at the same time, nor one thread to push and another thread to pop at the same time. This is exactly like a rwlock!

For this particular data structure, we could use rwlock to protect access. It is not exactly lock-free wait-free anymore, but maybe there are cases where the compromise, which is much easier to implement, is acceptable and useful.

Thursday, September 3, 2009

Migration to Gmail

Two days ago I successfully migrated two e-mail accounts to Gmail, one university e-mail and one department e-mail. Both of them are Unix based, and the IMAP server stores all mails except the INBOX in my home directory. Here is what I did.

Create a new Gmail account. It will hold both the university and department e-mails.
In my home directory, create a .forward file that contains the new Gmail e-mail address, and remove (or rename) the .procmailrc file.
Enable IMAP in Gmail.
Migrate old e-mails (I have about one year's worth kept in the mailbox) using imapsync. The university account runs UW imapd, and the department runs Dovecot. Here are the specific arguments to imapsync (besides login credentials) and why I needed them.

--folder INBOX (to specify the inbox folder)
--folderrec mail (this is the directory where all the mails are stored in Pine)
--prefix1 mail/ (mailboxes with this prefix are stripped by imapsync)
--regextrans2 's,Sent Messages,[Gmail]/Sent Mail,g' (I had used Apple Mail with these accounts too, and Apple Mail saves the sent e-mail in the Sent Messages folder, so I configured pine to do the same; now I want these to go under Gmail's Sent Mail folder)
--regextrans2 's,spams,[Gmail]/Spam,g' (I kept a separate spams folder; could have removed this folder before the migration).
I should have done this too: --regextrans2 's,INBOX,label-name,g' (the INBOX of each account goes under a different label). I simply did this manually after the fact which was a lot of work.

imapsync prior to 1.288 wants to download the list of all folders regardless whether you need them or not. This means the imap server will traverse all the files in my home directory, and this takes a lot of time. Version 1.288 solves this problem by only enumerating all folders when you use the --include option. I temporarily changed line 915 in imapsync as follows:
```
-my @all_source_folders = sort $from->folders();
+my @all_source_folders = sort $from->folders($prefix1);
```
When adding a new From e-mail in the Gmail account, they offer to setup an SMTP server. This eliminates the "From: yourname@gmail.com on behalf of youroldemail@example.com" message shown in Outlook.
I configured pine to access Gmail over IMAP as follows. I edited .pinerc to change the following settings:

inbox-path={imap.gmail.com:993/ssl/novalidate-cert/user=yourname@gmail.com}label-name (For each old e-mail I migrated, I tagged them with a dedicated label, so the inbox for Pine would show what was there before)
folder-collections=Main {imap.gmail.com:993/ssl/novalidate-cert/user=yourname@gmail.com}[] (This is so I can see e-mails under other labels)
default-fcc={imap.gmail.com:993/ssl/novalidate-cert/user=yourname@gmail.com}[Gmail]/Sent Mail
postponed-folder={imap.gmail.com:993/ssl/novalidate-cert/user=yourname@gmail.com}[Gmail]/Drafts

Lastly, I created filters to tag and archive incoming e-mails and then used the multiple inbox Gmail lab feature to be able to see e-mail coming in from both accounts at once.