Saturday, March 16, 2024

Deep Dive into MQA-CD Encoding

A few weeks ago, I saw this video by Techmoan introducing the MQA-CD. MQA-CD is an audio CD that can be played back in a regular CD player, which is limited to 16-bit samples at 44.1 kHz. However, when played back through an MQA decoder, it promises better sound quality at 24-bit at 192 kHz.

Before we dig into the MQA marketing material, we need to understand that MQA is an encoding scheme that can exist outside of a CD, e.g. audio delivered over the radio or the Internet. Some of the non-CD transports are assumed to carry 24-bit at 48 kHz or higher. However, MQA-CD transport is limited to 16-bit at 44.1 kHz by the CD as its physical medium.

At first glance, MQA violates the Nyquist–Shannon sampling theorem which places a hard upper-bound that a signal at frequency B must be uniquely represented by at least 2B samples per second. However, we can give it some leeway by allowing for lossy encoding, even though some MQA marketing material claims that the encoding is lossless.

In a lossy scheme, we can steal some lowest significant bits from the sample to passthrough a data stream like MP3 that employs psychoacoustic coding. The lowest significant bits sound like the noise floor when listened to without the decoder, and the psychoacoustic coding allows us to put more detail into the noise more economically—basically, the data stream contains instructions about how to synthesize only sounds humans can hear, so we use less data than if we have to encode the full Nyquist-Shannon spectrum. Furthermore, the data stream only needs to contain the delta, which is the sound not already present in the non-stolen bits.

The question about MQA-CD is how many bits it is stealing?

Music Origami, according to MQA

The MQA website links to a blog by the MQA inventor, Bob Talks, which discusses the CD encoding with some technical detail, but it is a little confusing:

If the original source is 44.1kHz/24b or if the sample rate is 88.2, 176.4, 352,8 kHz, or DSD, then a standard MQA file will be 44.1 kHz/24b. The file contains the information for decoding, ‘unfolding’, and rendering.

This 24b MQA file is structured so that, if in distribution it encounters a ’16-bit bottle-neck’ (e.g. in a wireless or automotive application), then the information in the top 16 bits is arranged to maximise the downstream sound quality and still permits unfolding and rendering. See [2]

[2] MQA-CD: Origami and the Last Mile 

So reference [2] should contain some information about how the 24-bit is truncated to 16-bit. Here are some mentions:

The Green signal is completely removed by MQA decoders; but it is there so that we can hear more of the music when playback is limited to a 16-bit stream.

Sometimes we might want to listen to MQA music on equipment that doesn’t support 24 bits – maybe only 16? Rather than throw away all the buried information, MQA carries a small data channel (shown in Green) which can contain the ‘B’ estimates, enabling significantly improved playback quality on, e.g. a CD, over ‘Airplay’, in-car, to certain WiFi speakers and similar scenarios.

But it is also confusing because it shows the “Green signal” at -120 dB. We know that CD dynamic range is 96 dB, so it could not have been able to represent -120 dB noise floor. Samples at 24-bit has a dynamic range of 144 dB. However, the signal charts in the page shows a floor of -168 dB, and it was putting some information below -144 dB, which requires 28-bits.

As a side note, CD dynamic range of 96 dB is determined by the formula in terms of the 16-bit sample depth: \( 20 \times \log_{10}{2^{16}} \approx 96 \). As a rule of thumb, each bit in the sample represents about 6 dB in dynamic range.

Another page Deeper Look: MQA 16b and Provenance in the Last Mile also states that:

If we look at the block diagram above, we can see there are three components to the MQA data, broadly described as: i) top 16 bits, ii) MQA signalling and iii) bottom 8 bits

The block diagram clearly shows that the encoding result in 24-bit master file, but it still does not explain how that is reduced to MQA-CD which is bottlenecked to 16-bit samples.

Is Bit Stealing Plausible?

Since MQA still does not explain how the 24-bit master is reduced to 16-bit transport depth on a CD, we are left to speculate about the bit stealing idea earlier.

If we allow stealing 4 bits per sample, then we get a data rate of \( 2 \textit{ channels} \times 4 \textit{ bits per sample} \times 44100 \textit{ Hz} \approx 344 \textit{ kbps} \). This is pretty generous for high quality AAC, which is typically 256 kbps. The dynamic range before decoding is reduced from 96 dB to 72 dB, which is still comparable to a very high quality magnetic tape.

So I would say it is plausible, but it is inconclusive from the MQA marketing material if this is how they did it.

Furthermore, I don’t see the point of MQA’s “Music Origami” that folds 24-bit 192 kHz into 24-bit 48 kHz. If the transport is already capable of lossless 24-bit data, it must be a digital transport that is not a CD, which means there is no requirement to maintain backwards compatibility with a Red Book CD player. We can just use the whole stream to transport encoded audio, e.g. AAC or Flac. Even some later CD players in the 2000’s can play MP3 from a data CD or from a USB drive. That was all possible before MQA launched in 2014.

Which is why Techmoan says that even if you believe MQA delivers higher quality audio, it is a format that came a little too late.