Friday, May 29, 2009

Revival of IBM ROM BIOS

Hypervisor (or virtual machine manager) seems to be a hot topic these days. It is a thin layer of protection domain that arbitrates resources among different operating systems (OS). This facilitates running multiple OSes on one machine concurrently. Some of the claimed advantages are:
  • Better protection in case of system compromise. If an attacker gains access to one OS, other OSes are not affected. Furthermore, for a hypervisor that supports snapshotting, if a system is compromised, its image can be restored to previously known good state.
  • Consolidation of hardware and flexible control. An OS image can move from machine to machine. This can be used to keep more machines utilized; the other underutilized machines can powered off to save energy.
To operating system researchers, hypervisor provides much easier development environment. Virtual machines like Qemu provides a gdb stub so the kernel can be debugged in gdb. However, an OS can only attract developers if it is in a functional state, and a significant barrier to get an OS into a functional state seems to be writing device drivers. A typical OS involves dealing with:
  • Power management (ACPI)
  • Disk controllers (Parallel ATA, AHCI)
  • Video display (vendor specific, but most have VBE BIOS for basic non-accelerated 2D graphics)
  • Networking (vendor specific)
  • Sound card (also vendor specific), and
  • USB or Firewire peripherals (OHCI and UHCI).
It may have to support legacy PS/2 keyboard and mouse devices as well as serial port. A virtual machine hypervisor, on the other hand, would implement an anti-device driver, which receives I/O requests from the guest OS and act upon them like real devices would do.
A hypervisor is sometimes designed to run on top of a host OS, and its anti-device drivers must translate raw I/O request back to the high-level I/O interface that a host OS provides. This translation is necessary so it can run unmodified OS that supplies a wealth of device drivers. But this translation is a waste of effort for operating system researchers who want to get an OS up and running quickly. They do not have resources to hire developers to write device drivers. Their own time can be better spent than reading device specifications.
When Sony introduced PlayStation 3, one of its more interesting capability is to be able to run Linux on it. Linux doesn't have direct access to most of the devices on PlayStation 3 (most notably the video card RSX hardware). However, hypervisor on PlayStation 3 provides a hypervisor call API that allows restricted frame buffer, hard drive, and USB access. Most of the Linux device drivers on PlayStation 3 are straightforward hypervisor calls. These hypervisor calls are undocumented, but their functionality can be deduced by how a call is used in the open source Linux kernel.
Back in the MS-DOS days, IBM ROM BIOS provides rudimentary drivers to interface with devices via BIOS call.
  • A jiffy counter in the BIOS data area around 0x400, updated by IRQ1/Int 8h handler.
  • A keyboard driver handles key-presses on IRQ1/Int 9h by putting keystrokes in a simple ring buffer, which is read using the keyboard service software interrupt Int 16h.
  • Floppy and fixed disk (hard disk) controller handling IRQ5 and IRQ6. It also provides a synchronous, blocking disk I/O service via Int 13h.
There is also video service via Int 10h, but it is overridden by most graphics cards to support VBE. VBE is still used nowadays in the Intel X.Org driver to query video modes.
MS-DOS provide basic file system, character I/O devices (serial port, printer, and console), memory and process management. Most applications still use BIOS calls if they want to. Computer viruses were particularly interested in intercepting BIOS calls to find opportunities to infect disks. However, games had to write their own video and sound drivers because these hardware types were not as standardized.
Sony PlayStation 3 hypervisor call looks very much like BIOS calls.
The PlayStation 3 hypervisor does not run multiple OSes at the same time, so it does not arbitrate resource allocation. However, its purpose is to restrict access to their hardware, so the hardware interface can remain proprietary.
A general purpose hypervisor, however, can also provide hypervisor calls. This allows better integration of guest OS with the host. For example, users in the host OS could resize the virtual machine window, and the guest OS is notified of resizing. The hypervisor could also provide OpenGL support for 3D acceleration. Many commercial virtual machine products accomplish this by supplying a proprietary guest-addition driver that bypasses hardware emulation and uses undocumented hypervisor API.
Now, it would be nice to have a standardized hypervisor API, so operating system researchers can put together a fully functional OS quickly to gain support, focusing only on the architecture and less on device drivers. Such experimental OS can also be run on hardware with a non-multitasking hypervisor (like Sony PlayStation 3), providing abstract hardware access. In essence, it would be extremely convenient for operating system research and development if we have something like ROM BIOS back in the days.

Friday, May 22, 2009

Interview Question

Yet another interview question that I came up with. This one tests the interviewee's ability to read and understand basic specifications.
You have a C program written to run on Linux that uses strndup(3), size-bounded string duplication. The function returns a copy of a source string only up to n characters. The copy is allocated using malloc(), and the copy is always zero-terminated even when the source string may not be. When you try to compile the program on Mac OS X, you found out that their Standard C library doesn't have strndup().
After browsing around, you found the functions strlcpy(3) and strlcat(3) which are the size-bounded string copying and concatenation. Can you use them along with malloc() to implement strndup()? If so, show your work. If not, explain why not.

Monday, May 18, 2009

autoheader, automake, autoconf

Some notes about these wonderful tools.
  • When incorporating and distributing third-party source code, autoconf can bootstrap recursively the third-party configure script. Specify using AC_CONFIG_SUBDIRS([third_party/src/...]). Additional configure arguments for the recursive invocation can be added to $ac_configure_args right before AC_OUTPUT.
  • Modernized configure.ac uses AC_INIT([name], [version], [desc]). To add automake capability, just use AM_INIT_AUTOMAKE without arguments.
  • Autoheader is used to generate config.h.in from configure.ac, which derives the configure script that converts config.h.in to config.h. The config.h file contains C preprocessor macros that affect compilation behavior depending on the outcome of configure script. Autoheader either requires AC_DEFINE to provide a description (third argument) or AH_TEMPLATE([VAR_NAME], [desc]) to exist in configure.ac.
  • To check for packages using pkg-config, use PKG_CHECK_MODULES(VARIABLE-PREFIX, MODULES, [ACTION-IF-FOUND], [ACTION-IF-NOT-FOUND]) macro (pkg.m4 seems to be the only source of documentation). If [ACTION-IF-NOT-FOUND] is blank, then the default is to fail and quit configure script. Supply [true] to make this package optional (i.e. if package not present, continue gracefully).