Life of a Computer Scientist: “Hybrid Log Float” 14-bit to 10-bit Pixel Transfer Function

Here I propose a Hybrid Log Float as a transfer function that maps from a 14-bit linear pixel to 10-bit non-linear pixel in a way that is inspired by floating point numbers. Unlike 16-bit half float, we don't have enough bits for both an exponent and the mantissa. We also don't need to represent fractions or negative numbers. Having a custom integer bit format allows us to store more significant figures while preserving the widest dynamic range from the input. Expressing the transfer function as an integer bit format allows the encoding and decoding between linear and non-linear colorspaces to be fast and bit-accurate without using a lookup table.

First, some background information. In this past year, I've had the pleasure of shooting some videos (here, here, and here) with Panasonic GH5s, which is a fine camera. When it was introduced in January, it was the first compact interchangeable lens camera that could internally record DCI 4K in 4:2:2 10-bit H.264. Such recording ability was only later superseded by the Blackmagic Design Pocket Cinema Camera 4K (BMPCC) released in September, which records in 12-bit CinemaDNG RAW or 4:2:2 10-bit ProRes. These two cameras have a very similar sensor (if not the same) that outputs 14-bit per pixel. GH5s will generate 14-bit RW2 when taking pictures (e.g. time lapse), but not the BMPCC.

The sensor's 14-bit per pixel output means that the maximum theoretical dynamic range is 14 stops. That's because each stop is twice or half the amount of light, so the number of bits per sample corresponds to the number of stops. This is corroborated by the assessment on GH5s done by Alan Roberts in EBU Tech 3335 S29. He measured 14.6 stops for Hybrid Log Gamma and 13.5 for V-Log L. Both Hybrid Log Gamma and V-Log L are different picture profiles that map the 14-bit sensor output to the 10-bit codec. The measured dynamic range exceeds the theoretical somewhat, but it's within measurement error. Also, quantum uncertainty means that light level below the sensor's sensitivity could sometimes be picked up as temporal noise. But as a rule of thumb, we can think of the number of bits as the number of stops.

It is interesting how dynamic range can be affected the way the codec downsamples 14-bit sensor output to 10-bit. The most naive approach is to simply truncate the least significant 4 bits, but that means you lose 16 shades in the shadow and end up with only 10 stops of dynamic range. This is pretty harsh if someone wants to recover shadow from an underexposed picture. This is probably what Like709 picture profile does on GH5s (simulates BT.709 color space) since it only measures 10.3 stops.

The Hybrid Log Gamma (HLG) and V-Log L picture profiles take advantage of the way human eyes perceive intensities of light non-linearly, so they downsample 14-bit sensor output (which is the linear measurement of light) to 10-bit for codec using transfer functions that preserve the most perceptible shades. However, the picture profiles are designed for different purposes. HLG is calibrated for consumer TVs that render HDR in BT.2100 colorspace, which is hard to color grade (changing the look and feel of a picture by manipulating colors) because the colorspace is non-linear. V-Log L has a flatter picture profile which makes color grading easier, but it's still better to convert it to a linear colorspace for more accurate color grading. Non-linear colorspace can be converted back to linear using a look up table (LUT).

Then two things had occurred to me.

First, cameras should really provide a 16-bit codec. If the sensor data are only 14-bits, normalize the sample by zero-padding the least significant bits. The additional bits aren't a significant storage overhead (probably negligible after either lossless or lossy compression), and it makes processing on a computer a lot more efficient since the samples are naturally aligned to 16-bit short integers, which is well suited for SIMD instructions. Unfortunately, the only 16-bit capable codec today is HEVC, and it's not widely implemented. Both ProRes RAWand CineForm are currently 12-bit, even though their coding principles (DCT for ProRes, Wavelet for CineForm) are bit-agnostic, so it's just a matter of someone implementing it. I suspect that 16-bit codec will be much easier to implement. It would avoid this bit shifting voodoo to unpack 14-bit integers.

Second, since a 16-bit capable codec is a bit out of reach, it's beneficial to have a transfer function that is bit-accurate (to minimize rounding errors) but still takes into account the way human perceives light intensities: we tend to notice minute light variances in darkness, but we are less sensitive to small variances when the environment is bright.

The result is the Hybrid Log Float transfer function, which encodes a linear value to logarithmic scale and decodes back to the same linear colorspace with predictable allocation of significant figures. The idea is to encode the exponent as the number of leading one bits in unary, and the rest of the bits can be used for storing significant figures, in a way that's inspired by the UTF-8 encoding. On the other hand, the A-Law, µ-Law companding algorithms count the exponent in binary in the same way Minifloat works (but different number of bits allocated to exponent and mantissa), rather than in unary.

To illustrate how unary exponent works, let's consider 14-bit to 12-bit reduction first.

It's broken down into four spans. The first span (a) has 11 bits of significant figures, and is encoded verbatim as linear. The second span (b) has 10 bits. The last two spans (c) and (d) each have 9 bits, and is actually ½ log(x) = log(√x).

Here is the 14-bit to 10-bit reduction using the same approach.

The spans are (a) 9-bit linear, (b) 8-bit log, (c) 7-bit 1/2 log, (d) 6-bit 1/4 log, (e) 5-bit 1/8 log, (f) 5-bit 1/8 log. Notice that the two highlight tiers only have 5-bits, or 32 shades, which may not be enough to recover highlights. Since this encoding allocates more bits for the shadows, it prefers the picture to be slightly under-exposed.

Here is an alternative 14-bit to 10-bit encoding that reserves more bits for the midtones and highlights, so it has better color accuracy for highlight recovery. The trick is to pre-allocate bits for counting the exponent in binary rather than unary at first, but revert back to unary exponent afterwards.

The difference is that spans (b), (c), and (d) are now 7-bits each, or 128 shades, at 1/2 log scale, using three bits to count the exponent. The upper two tiers (e) and (f) have 6-bits each, or 64 shades, at 1/4 log(x) scale.

Here is a plot comparing:

Two variants of Hybrid Log Float (float v1, float v2)
Log curve normalized to the output range (73*log2(x))
Gamma correction curve at γ = 2.4 normalized to the input range (x ** (1/2.4) * 18) and the output range (x ** (1/2.4) * 57).
Linear normalized to the input range (x / 16) and the output range (x).

In the plot, the x axis is the input range, and the y axis is the output. A steeper slope indicates better color resolution. Clipping happens when the curve goes off the chart.

It may seem that log provides too much value resolution for shadow, and gamma 2.4 seems to provide just the right amount. But if we zoom into the input range 0-256, we can see that both log and gamma use much wider output range than the input range, resulting in wasted bits and more difficult noise reduction since the noise is amplified. Both Hybrid Log Float variants follow the same slope as linear x (normalized to output) at this scale, so we don't have the noise amplification problem.

Overall, Hybrid Log Float has the following benefits:

It does not waste output bits in the deep shadow, so it doesn't have the noise amplification problem that log or gamma may suffer.
Good midtone resolution.
It resolves highlights similarly like log for greater dynamic range.
The alternative second variant of Hybrid Log Float provides better midtone and highlight resolution than the first variant.

Also most importantly, Hybrid Log Float provides bit-accurate and efficient decoding back to the linear colorspace, so the color grading can be done the most accurately.

Last but not the least, while I would like to see cameras record to Hybrid Log Float when using a legacy 10-bit codec, ultimately I would like to see native 16-bit codecs so the post-processing does not have to waste precious machine cycles realigning the awkward 10-bit integers.

Life of a Computer Scientist

Tuesday, December 25, 2018

“Hybrid Log Float” 14-bit to 10-bit Pixel Transfer Function

No comments: