Tuesday, June 30, 2009

Kernel page tables

My office mate has been working on an operating system kernel, and apparently page table manipulation is very error prone. Here are some important things that are easy to forget:
  • Special memory I/O pages cannot be freed into the general page pool, such as the physical address used for I/O APIC. If it happens, then another program would allocate the memory I/O page and then write to that physical address as if it's RAM.
  • If two address spaces share a page (mmap w/ MAP_SHARED, or shmat), then pages need to be reference counted.
  • If two processes share the same page table (clone system call), then page tables need to be reference counted.
Since he's too engaged in debugging, I'm writing these down for him in case he forgets. I said to him, maybe the reason why people haven't made ground-breaking research into redesigning operating system architecture because the moment you walk in, you get lost in the woods, and you don't see the forest anymore.

Wednesday, June 17, 2009

Super-Sized Del.icio.us, or Down-Sized Wayback Machine

The problem with URL bookmarks is that they can always become dead links. Here are some of the common reasons:
  • It takes people and resources to keep web servers running, and the company or institution that maintains the web site just went away, bringing their web site with them.
  • The page disappears because the webmaster no longer deems the page worthy of keeping. It could happen when a web site is revamped, so some pages just disappear while others have their URL changed to fit the new site structure.
  • Pages can be moved around as part of a minor reorganization effort.
  • Content agreement mandates that the page is only accessible for a certain period of time.
Furthermore, even if the page still exists, it may have been edited. The information that you wanted to be bookmarked may no longer exist. The Internet Archive Wayback Machine solves part of this problem by crawling a given website constantly at different dates, and by allowing you to retrieve a version at an earlier date. However, it has its own problems:
  • It only does best-effort crawling. In particular, it may take up to 6 months for the URL you submitted to show up.
  • It is a web crawler, so it voluntarily observes robots.txt.
  • The web page it archives is often missing images and other constituent material.
  • It has to rewrite hyperlinks in the web page.
  • It won't work with Javascript and XMLHttpRequest (AJAX).
Furthermore, the Wayback Machine often crawls redundantly and unpredictably. It may attempt to crawl other parts of the website that you're not interested in, omitting the part you actually want. It may crawl the URL when you don't need it, and not crawl when you do.
Here I propose a service that takes a complete snapshot of a web page in the manner of a bookmark. This will be an immensely helpful research tool for historians and other people who need to keep a small piece of artifect for a web page they're interested in.
Web browsers have had the ability to cache web pages on disk, so static content doesn't have to be redownloaded everytime the web page needs to be rendered for display. The idea is to build a more sophisticated caching feature on top of a web browser.
  • All HTTP connections are memoized by the snapshot, so Javascript and even some Flash content will work.
  • Support multiple snapshots for the same URL at different time and be able to display them side by side for comparison. This is not something a proxy server can do.
  • Snapshots are searchable.
  • Schedule regular snapshots or prefetch web page.
Possible advanced features:
  • Store snapshot on external server. Pro: can consolidate snapshot of the same webpage taken by two different people, if they are the same. Pro: can allow public access to snapshots. Pro: high availability because customer doesn't have to manage their own storage. Pro: high mobility because snapshots live on "the cloud." Pro: centralized search and indexing. Con: will cost money to run storage server infrastructure, so service will not be free. Con: not likely supported by ads revenue.
  • Allow public access to the snapshots. Pro: any website can link to a snapshot to avoid the external dead link problem. Con: will run into copyright issue when the original web site owner determines he is losing profit because of the snapshot.
Technologies that can be leveraged:
  • QT for cross-platform user interface. Doesn't require commercial license; LGPL is fine.
  • WebKit (available in LGPL) or Presto (commercial proprietary license) layout engine for rendering web pages.

Saturday, June 6, 2009

Multiresolution NLE video codec

Highly compressed video codec such as H.264 is often too CPU intensive for real-time editing. Their inter-frame compression also makes frame-to-frame editing difficult. A typical workflow to work with these videos is to transcode them to an intermediate codec, which uses lower complexity, intra-frame only compression, so that non-linear video editing software can manipulate the video frame by frame in real time. Intermediate codec typically compresses at a much higher bitrate to make up for lower complexity (though still more efficient than uncompressed video); as a result, it requires higher disk bandwidth.
In order to work with computers with slow disk (i.e. laptop), sometimes it is desirable to edit with reduced resolution video. When the editing finalizes, rendering is done using full-resolution footage. However, during part of the workflow, it may be desirable to use full-resolution (e.g. compositing), but other times half or quarter resolution (timeline editing). One would transcode into the intermediate codec in multiple resolutions, which is a waste of disk space and a headache to manage.
Two popular intermediate codecs are Apple ProRes 422 and Avid DNxHD, and neither of them tackles this issue. Here is my idea. A plausible construction of an intermediate codec is to just JPEG encode video frame by frame, so I'll use that to illustrate. We can encode a frame progressively as follows.
  • Given a video frame, produce ¼×¼ and ½×½ resolution images.
  • Encode ¼×¼ image using JPEG, decode it, and scale it up by 2x (i.e. to ½×½ the original size). Make a difference image from this one and the original ½×½ image.
  • Encode the ½×½ difference image using JPEG, decode it, and scale it up by 2x (i.e. to the original size). Make a difference image from this one and the original image.
  • Encode the original-sized difference image using JPEG.
This allows quarter resolution, half resolution, and full resolution reconstruction of the video stream. Data chunks in the stream can be arranged so that if we want to decode lower resolution picture, we don't have to read the higher resolution data.
Some issue that may not work well:
  • JPEG causes compression artifect that is eccentuated by the layered encoding scheme.
  • Total decoding time for full resolution is tripled (bottleneck being memory bandwidth rather than ALU).

Update (6/23/2010): apparently JPEG "progressive encoding," which allows an image to be loaded with gradually finer details (not the same as progressive vs. interlaced frame format), may pave way to how this can be done. The video container would need to become progressive encoding aware. Wavelet encoding in JPEG2000 would also have a similar progressive encoding scheme.

Thursday, June 4, 2009

Multiple definition of... extern inline

When I tried to compile gnutls-2.6.6, I got a lot of these:
.libs/gnutls_compress.o: In function `__strcspn_c1':
/usr/include/bits/string2.h:972: multiple definition of `__strcspn_c1'
.libs/gnutls_record.o:/usr/include/bits/string2.h:972: first defined here
.libs/gnutls_compress.o: In function `__strcspn_c2':
/usr/include/bits/string2.h:983: multiple definition of `__strcspn_c2'
.libs/gnutls_record.o:/usr/include/bits/string2.h:983: first defined here
It turns out that gnutls decides to enforce ISO C99 standard on all its source files, but a lot of the glibc header files use extern inline, which causes GCC to emit a symbol for each extern inline functions. GCC made a change to the extern inline semantics in GCC 4.3 in order to be ISO C99 compliant.
The fix I chose is to add -fgnu89-inline option to GCC throughout. It can be accomplished by invoking configure script like this:
./configure [config_flags...] CFLAGS=-fgnu89-inline
And this solves the problem.

Tuesday, June 2, 2009

Editing PDF Files Using Open Source Software

Here is a workflow that allows quite fancy altering of PDF files using only open source software. I've only used it to add overlays, so your mileage may vary.
  1. Use pdf2svg to convert the PDF to SVG. Since PDF may contain multiple pages, each individual page must be converted to its own SVG file, but you only need to convert the pages you will edit. If you are not editing a page, you don't need to convert it but keep the original PDF file; we will use it later.
  2. These SVG files from (1) can be edited natively in Inkscape. Use "Save as..." and save the result using PDF via Cairo (*.pdf). I did not convert text into path.
  3. Follow the suggestions in How to concatenate PDFs without pain; I used the last pdfpages .tex approach that allows me to assemble from various PDF files. This requires pdfLaTeX, which comes with a TeX distribution, e.g. TeX Live. The .tex file must be processed through pdflatex and not other variants of latex command.
The SVG output from pdf2svg is a bit clunky. Not sure if that's because fonts are embedded. The PDF output saved via Cairo is also a bit clunky, but the save is quick. Finally, pdfLaTeX runs fairly quickly and produces an optimized file comparable in size to the original.

ARM processor notes

The first thing I knew about ARM processor is that, in Glibc, they don't support atomic machine instructions, relying on the Linux scheduler to grant atomic uniprocessor access in order to implement a somewhat atomic compare-and-swap operation. However, it does mention that, the new ARMv6 specification has ldrex/strex (load-exclusive and store-exclusive) instructions.
Some notes about ARM lineage from the ARMv6 whitepaper: ARM10 processor implements ARMv5 specification (1996-), which is uniprocessor only). ARMv6 (2000-) adds shared-memory multi-processor support that is implemented by ARM11 family of processors (2002-).
The ldrex and strex instructions are no different than load-link/store-conditional, a primitive form of hardware transactional memory. They can be used to implement compare and swap.
Although Glibc doesn't support ldrex/strex, the Linux kernel apparently does.
I am interested in compare and swap implementation in ARM because it is a fundamental building block for scalable shared memory multi-processor programming, used by software to implement non-blocking algorithms. If ARM wants to invade the server space by offering low energy usage data centers, they would have to do that.