Saturday, June 6, 2009

Multiresolution NLE video codec

Highly compressed video codec such as H.264 is often too CPU intensive for real-time editing. Their inter-frame compression also makes frame-to-frame editing difficult. A typical workflow to work with these videos is to transcode them to an intermediate codec, which uses lower complexity, intra-frame only compression, so that non-linear video editing software can manipulate the video frame by frame in real time. Intermediate codec typically compresses at a much higher bitrate to make up for lower complexity (though still more efficient than uncompressed video); as a result, it requires higher disk bandwidth.
In order to work with computers with slow disk (i.e. laptop), sometimes it is desirable to edit with reduced resolution video. When the editing finalizes, rendering is done using full-resolution footage. However, during part of the workflow, it may be desirable to use full-resolution (e.g. compositing), but other times half or quarter resolution (timeline editing). One would transcode into the intermediate codec in multiple resolutions, which is a waste of disk space and a headache to manage.
Two popular intermediate codecs are Apple ProRes 422 and Avid DNxHD, and neither of them tackles this issue. Here is my idea. A plausible construction of an intermediate codec is to just JPEG encode video frame by frame, so I'll use that to illustrate. We can encode a frame progressively as follows.
  • Given a video frame, produce ¼×¼ and ½×½ resolution images.
  • Encode ¼×¼ image using JPEG, decode it, and scale it up by 2x (i.e. to ½×½ the original size). Make a difference image from this one and the original ½×½ image.
  • Encode the ½×½ difference image using JPEG, decode it, and scale it up by 2x (i.e. to the original size). Make a difference image from this one and the original image.
  • Encode the original-sized difference image using JPEG.
This allows quarter resolution, half resolution, and full resolution reconstruction of the video stream. Data chunks in the stream can be arranged so that if we want to decode lower resolution picture, we don't have to read the higher resolution data.
Some issue that may not work well:
  • JPEG causes compression artifect that is eccentuated by the layered encoding scheme.
  • Total decoding time for full resolution is tripled (bottleneck being memory bandwidth rather than ALU).

Update (6/23/2010): apparently JPEG "progressive encoding," which allows an image to be loaded with gradually finer details (not the same as progressive vs. interlaced frame format), may pave way to how this can be done. The video container would need to become progressive encoding aware. Wavelet encoding in JPEG2000 would also have a similar progressive encoding scheme.

No comments: