Life of a Computer Scientist: April 2007

Friday, April 20, 2007

Comparing Javascript Inheritance in Firefox, IE7 and Opera

When I started writing Javascript programs just a few months ago, I got excited in how you could augment an existing class by modifying its prototype. For example, you could add a functional-style map (which is what Google's map-reduce is based on) to Array by modifying its prototype as follows:

Array.prototype.map = function (f)
{
var len = this.length;
var arr = new Array(len);
for (var i = 0; i < len; i++)
arr[i] = f(this[i]);
return arr;
}

Now, for any array, you could call the map method, passing in a mapping function as an argument, and the result would be a new array where each element is transformed from the original array by the mapping function. For example:

var xa = [1, 2, 3, 4];
var ya = xa.map( function(n) { return n * n; } );

would result in ya = [1, 4, 9, 16].

I also became fascinated by a technique to dynamically making an existing object inherit from another object, by copying the other object's properties over. For example, I don't need to hard-code event handlers in HTML. I could wrap these event handlers in a constructor function along with all local, instance-specific states. I lookup the HTML element whose behavior I want to override, and inject my event handlers to it. I could also reuse that event handling class for many elements in the same document without worrying if I would pollute global variables. This achieves good abstraction and code reuse.

Since all objects are inherited from Object, I could write a function Object.prototype.inherit and make that sort of dynamic inheritance available to all objects. Right?

Not so fast. It turns out that in Internet Explorer 6 and 7, not all objects are instances of Object. Most notable ones are document, window, any values representing DOM node or a node list, and any instances of ActiveXObject. As a result, they do not inherit changes made to Object.prototype. Furthermore, these objects don't have a constructor. All objects have a "constructor" property, but these objects are an exception.

Are there any exceptions like that in Firefox? There is. If you try to obtain a DOM node list, for example, by document.getElementsByTagName("A"), you would get an object whose constructor is HTMLCollection. Firefox admits such object to be an instance of HTMLCollection class, but the object doesn't inherit changes made to HTMLCollection.prototype (which is actually read-only). It does inherit changes to Object.prototype, however.

Such inconsistencies are probably due to the fact these objects are implemented natively (in C) and wrapped for Javascript.

Some browsers are more consistent for native objects than others. All classes (constructor functions) are instances of Function because that is the way Javascript is designed. In Firefox, IE and Opera, all primitive classes such as Number, String, RegExp, even Function itself are instances of Function. The situation is different for DOM classes.

Document object is an instance of HTMLDocument (Firefox, Opera), and HTMLDocument is an instance of Function (Opera only).
A DOM node list object returned by document.getElementsByTagName is an instance of NodeList (Opera, this is the correct W3C DOM behavior) or HTMLCollection (Firefox, compatible but incorrect), and NodeList is an instance of Function (Opera).
XMLHttpRequest is an instance of Function (Opera, IE7).
ActiveXObject is an instance of Function (IE6, IE7).
IE doesn't define DOM classes such as HTMLDocument, HTMLElement, NodeList, etc.

As one can see, Opera is very consistent in inheritance even for native objects.

Here is what you can count on when it comes to prototype inheritance. In Firefox and Opera, all objects inherit from Object. In IE, objects that interface with the browser or other native components do not inherit from Object.

In the beginning of this article, we saw how prototype inheritance can be used to augment existing classes. If this feature is implemented consistently, then this is a powerful way to extend Javascript. Imagine how we can work-around browser incompatibility by implementing the missing functions in Javascript. For example, DOM2 events can be simulated by setting appropriate HTML element attributes in DOM1. To an extreme, it is possible to emulate DOM in ancient browsers as long as we have document.write (though it is not practical to do).

It is not surprising that IE has the most broken Javascript implementation, considering that IE is historically the dominant browser. Because prototype inheritance is not possible for DOM, browser, and other native objects, IE is the least flexible in terms of language extension and work-around. By contrast, both Firefox and Opera have better prototype inheritance, and it is easier to extend or adapt these browsers.

Tuesday, April 17, 2007

Problems with Generics in Java

I was substituting labs for a introductory programming class taught in Java a few months ago, and I thought it would be nice to teach Generics to the students because it illustrates safe, reusable coding practice. I usually program in ML where polymorphic functions are written and used almost everywhere. However, I soon ran into various dark corners of type variables in Java. Such complications are not present in ML due to ML's absence of subtyping and mutable values.

The complications of Generics in Java are well-documented in this paper, Parametric polymorphism for Java: is there any hope in sight?, and this IBM article, Java theory and practice: Generics gotchas, but I'm summarizing the problems a bit differently, in a way you can see how bad assumptions break the system.

Invariance It is almost always assumed that if T extends T', then Class<T> extends Class<T'>; this is called covariance. This is true for read-only (immutable) collections, but not true if you're allowed to insert elements. The original collection assuming elements have type T now contains elements of type T'. For Java, subtyping of generic classes really is invariant.

Static type erasure In order to maintain backward compatibility such that the newer generic collection classes can be used by older non-generic code, Java treats class Class<T extends S> the same as Class whose definition has T replaced by S throughout (if T does not extend anything, then Java assumes T extends Object). The idea is to reduce parametric polymorphism to subtyping polymorphism. It works if Class<T> extends Class<S> holds, but since we already know subtyping should be invariant in Java, this is unsound.

Dynamic type erasure Again, for the sake of backward compatibility, existing "compiled-once" bytecode must be able to use compiled generic classes, including being able to reflect on it. Reflection is a language device to get information about meta-properties of an object such as its class and its method names. Because older code has no concept of generics, Class<T> is just Class in the run-time. After erasure, Class<T> cannot reflect on T, so it doesn't know T's constructor (cannot construct T) nor can it construct an array of T. In order to work around this, the construction site of T must be given a reflection of class T.

Another consequence is that you lose type information when you serialize objects. After writing Collection<T> to a disk file, you no longer know what T is after reading it back. If you want to manually keep track of this information, your serialization must be given a witness of T in the same manner as constructing T and an array of T.

Solution? What if we provide backward compatibility by keep the old non-generic collection classes, and put the new generic-based collection classes in a new package? The same goes for reflection API, where an old version reflects only on the type erased part of a generic class, but a new reflection API can inquire about type variables.

There may still be complication when old code tries to construct an instantiation of generic class using reflection, but I think this contains the problem much better.

Saturday, April 14, 2007

Presentation Technique

Recently, I attended a regional conference on programming languages, and I was giving a talk there about my own research. Although it was the first time I gave talk at a public venue, that I am by no means experienced, I had thought of some techniques that I want to share. The most important issue is actually timing, and many of the concerns revolve around that.

The novelty of my ideas really falls in the inverse pyramid organization of slides, but I feel that common-sense presentation techniques should be considered before that for effectiveness, so I would mention the common-sense first.

Pace yourself The amount of time you're given for a talk limits how much material you can cover. After finishing all the slides, you should rehearse at your own leisure. This allows you to determine the natural pace of each slide. You should not rush, and you might even want to slow down a bit.

Remain focused It is likely that you go overtime in the first rehearsal, but the solution is not to speed up; you must decide what to prune. Consider what is the most important message you want to deliver, and only keep the slides which support that message. It is also possible that you may not have enough slides to support your thesis, which is only apparent after rehearsing to someone and hearing back from them in my case.

If you choose to speed up rather than prioritize, you would find that your talk still go overtime and out of focus.

Inverse pyramid scheme After pruning, you might be concerned that you pruned too much, which makes you appear unprepared. The inverse pyramid scheme is a well-known journalist approach that states the bottom-line about a new report first (when, where, what) and continues to the less important facts (why, how). The scheme is devised so an article can be truncated easily by a newspaper editor to fit the page. It can be applied to a technical talks as well.

I organize my slides as a logical tree, but I move the leaf slides to an overflow section. The official talk ends at a slide that says "thank you for your attention," but the overflow slides follow. The idea is that I would not talk about them but refer to them if people ask questions.

My overflow slides contain the hairy technical details that take some time to digest. If you have detailed slides that you don't want to spend too much time on them, they are probably good candidates for overflows. However, if the slides are integral to your talk, then you should just break them up instead.

Conclusion I used these techniques for the talk, and at the end I feel I was well-prepared and well-timed. The inverse pyramid scheme allowed me to employ an "on-demand" approach in showing some slides, and it appeared effective and on-point. However, I think this approach is most effective when used after the common-sense to maintain pace and focus.

Thursday, April 5, 2007

Compiling AbiWord from Source on Mac OS X 10.3

The latest AbiWord at the time of writing is 2.4.6, but for Mac OS X, only the binary for 2.4.5 is available for download. I've decided to compile one. However, the instructions aren't very clear, and the build system appears to be a hack, so I've decided to write about it.

First of all, AbiWord relies on a number of Fink libraries, so make sure they are installed.

fink install \
libgsf glib2 libxml2 gettext libiconv bzip2 libwpd-0.8

Alternatively, you may use apt-get to avoid compiling these packages from scratch if you know what you're doing.

The official way to build AbiWord is to go to ${SRCDIR}/pbx and read the instructions there and proceed. However, before running build_all.sh, I had to adjust a couple of things.

First,

# current working directory is ${SRCDIR}/pbx

ln -s ../{popt,libpng,wv} .

so build_all.sh can find them.

Notice that for fribidi, you cannot use the source code supplied in ${SRCDIR}/fribidi, which is for building on Windows only. Prepare it by the following steps:

Download and unpack a fresh copy of fribidi. You should get a directory called fribidi-${version}, where ${version} is some version of fribidi you downloaded.
Make a symbolic link from fribidi to fribidi-${version} so build_all.sh can find it.
Run configure script there to generate config.h
Make a symbolic link from wcwidth.c and wcwidth.i to fribidi_wcwidth.c and fribidi_wcwidth.i respectively.
Open fribidipbx/fribidipbx2.pbproj, then add fribidi/fribidi_types.i and fribidi/fribidi_char_sets.i to the project. The paths should be "relative to project," and make sure their roles are "public" under Targets/fribidipbx/Headers and the file types are "sourcecode.c.h".

AbiWord requires Enchant, which is not included from the source tar.gz for some reason. Get it using the commands:

# current working directory is ${SRCDIR}/pbx

export CVSROOT=':pserver:anoncvs@anoncvs.abisource.com:/cvsroot'
cvs login    # password is 'anoncvs'
cvs export -renchant-1-3-0 enchant
cvs export -rrelease-1-2-1 enchantpbx
unset CVSROOT

Open enchantpbx/enchantpbx.xcode with Xcode and add enchant/src/pwl.c to the target "enchant framework." Now enchant should build.

The last phase is to prepare the source of abi/, using the commands:

# current working directory is still ${SRCDIR}/pbx

wget http://www.abisource.com/.../link-grammar-4.2.4.tar.gz
tar zxvf link-grammar-4.2.4
ln -s link-grammar-4.2.4 link-grammar
(cd .. && ln -s pbx/link-grammar-4.2.4 link-grammar)
perl -pi -e 's,DICTIONARY_DIR=\\"\\",DICTIONARY_DIR=\\\\\\".\\\\\\",' \
link-grammar/link-grammar.xcode/project.pbxproj

ln -s ../{abi,abipbx,abidistfiles,abiword-plugins} .
(cd abi/src/wp/ap/xp/ToolbarIcons/ && make)

perl -pi -e 's/hdiutil create -megabytes 30/hdiutil create -megabytes 40/' \
build_all.sh

Finally, one should be able to build AbiWord using the command ./build_all.sh all. However, the resulting AbiWord won't run if your glib2 depends on libgettext3 (libintl.3.dylib) instead of gettext (libintl.1.dylib). This, of course, depends on the versions of Fink packages you use. If this is a problem for you, you can either try to add libintl.3.dylib to abipbx/abipbx2.pbproj (add that to preparelibs.sh as well), or just copy /sw/lib/libintl.3.dylib to AbiWord.app/Content/Frameworks.

It is great to have AbiWord on Mac OS X, but users should not jump through so many hoops to compile it from source. The source tree needs a good clean-up, and Xcode project files need to be made consistent with the latest versions of fribidi and enchant. They should bundle fribidi (for building Mac OS X framework), enchant, and link-grammar. They should definitely fix the build_all.sh script, so the symbolic links are no longer necessary.

Furthermore, I think it's a bad idea to depend on compiled Fink binaries. This has at least two problems.

Fink package versions vary wildly from machine to machine. Fink itself ensures only that minimal dependencies are met, and will happily use more recent packages if available. The situation of gettext vs. libgettext3 is just one of the many examples.
One usually expects that Fink packages are compiled from source and run on the same machine. It is common to find packages that assume MACOSX_DEPLOYMENT_TARGET to be the same version as the host machine. If you compile AbiWord on 10.3, you can't run it on 10.2.

I hope they fix these problems. Please let me know if these instructions work for you.

Wednesday, April 4, 2007

Google Apps Appliance

A fellow student sent some e-mail attachment of technical report updates to a professor a few weeks ago, but he's still waiting for the update to be posted. I suggest that it is too much work for the busy professor to fish for the attachment from inbox, so he should send them again. I also wonder, wouldn't it be wonderful to be able to search our department's e-mail like Gmail does it?

It looks like Google Search Appliances shouldn't just be for searching an enterprise's Intranet web. What about e-mails and calendar items? Google now offers an array of applications—Gmail, Calendar, Chat, and Docs & Spreadsheets—which are ripe candidates for business solutions. They currently offer these services on a domain level as Google Apps for Domains. What about Google Apps Appliances? It looks like this is not a new idea; many people have speculated about it. Obviously, Apps Appliances are expected to provide search functionality like the Search Appliance.

There are some merits for Apps Appliances over Google Apps for Domains.

Privacy Companies enjoy greater privacy of data, as their e-mails, calendars, chat logs, and documents never leave the site. This is the same argument for Google Search Appliances.

Migration and Integration An admin should be able to integrate Google Apps with existing backend: NFS, SMB for storage of documents; Exchange, Notes for e-mail and calendar; IMAP gateway for e-mail. However, I can already see that Exchange and Notes integration will be hairy.

Google Apps for Domains currently can use IMAP gateway. Other forms of integration are not possible due to corporate Intranet security.

I think backend integration difficulty depends on how much integration is desired. It is difficult to interact with Exchange and Notes in real-time because these protocols must be reverse engineered. Efficiency is another consideration; these backends might not be designed to process massive amount of queries like the Google File System is designed to handle, and search performance (which is a hallmark of all Google Apps) will be severely degraded if that is the case.

It is easier to import and export data especially if everyone speak the same language (iCal for calendar, Unix SVR4 mbox for e-mail, and HTTP for document transfers). It is important that import and export should not lose fidelity of data, and the whole process should be automated.

Google Calendar allows import and export of calendars in iCal format, but I think each user has to do it one by one. This is clearly not feasible for a whole domain of users.

New Product—Filesystem Appliance Each Apps or Search Appliance will happily serve its own filesystem, but storage space can run out. When this happens, companies should be able to purchase storage solutions from Google and enjoy scalability and redundancy that Google File System provides. This is another business opportunity.

At this point, however, I want to stress that it is a bad idea if Apps or Search locks in to Filesystem Appliance by lack of network filesystem support such as NFS or SMB. This is because: (1) lock-in is evil by virtue; (2) companies who invested heavily on their storage solution won't be happy; and (3) a domain specific Filesystem Appliance cannot compete with general purpose network storage systems like Network Appliance products.

For now, without the Apps Appliances, I think the best way for graduate students to help busy professors manage e-mail attachments is by sending them again. It is as if graduate students aren't busy... Right?