Life of a Computer Scientist: Redundancy of audio packets over lossy network

I wanted to know what people currently do to stream audio packets over a network with packet loss. There are several options for streaming audio, and they vary in their effectiveness against packet loss.

AirTunes (aka. RAOP) is the protocol used between iTunes and Airport Express, based on RTSP (for establishing session) and RTP (for transmitting data). My impression is that you could stream MPEG4 AAC and Apple Lossless Audio Codec to Airport Express, though it may be possible to stream uncompressed audio too. Note that audio codec itself introduces algorithmetic delay, since compression is performed by grouping audio data over a time window. It is not clear how the protocol deals with packet loss, since RTP does not define how to deal with it.
Netjack connects a jack server to a slave. It can stream CELT or raw audio over UDP. CELT is a low-delay audio codec with packet loss concealment. However, a tradeoff of the low-delay is that it could not resolve low-frequency pitch (~100Hz fundamental), and has to rely on a long term predictor. I'm not sure how this affects audio quality in practice. The packet transport does not seem to attempt recovering from packet loss, but instead relies on CELT codec for packet loss concealment. The slaves audio time clock is synchronized to the jack server.
Netjack2 is really a distributed version of jack, using remote computing nodes to apply computationally intensive audio filtering, instead of playing back sound. They implemented a discovery protocol based on multicast, and audio is transmitted over UDP, with no packet loss handling. When using an audio adapter for streaming audio to a slave sound card device, audio resampling is applied to compensate for clock drift.
Jacktrip, developed at Stanford CCRMA, connects two jack servers over a network. There doesn't seem to be any time synchronization. Data is transmitted over UDP, and it is possible to transmit redundant data in a rolling window in order to recover from data loss.
There is a proposal (quick view) in using Reed-Solomon code to transmit data over UDP at Kyushu University.

I'm particularly interested in technique that could deal with wireless network packet loss, which has the following characteristics:

Wireless packet radio implements low-level retransmission of lost or corrupt packets. On the higher level, this is perceived as high packet delay.
When packet loss on the higher level does occur, it usually happens in bursts of packets. I estimate that up to 3ms radio gap is possible in normal condition.

Regarding these characteristics, the rolling window redundancy used by Jacktrip would not be effective, since a whole streak of packet loss would not be recoverable under this scheme.

When using Reed-Solomon, if we use byte-sized symbols, then a code block can only be 255 bytes (including parity). If we naively transmit the 255 bytes over UDP as a single packet, it is possible that the whole block would be dropped altogether. But we could transpose the block so that, when transmitting n blocks over m UDP packets, each UDP packet would contain 1/m slice of n blocks. This seems to be the approach taken by the Kyushu University proposal. This scheme, known as Cross-Interleaved Reed-Solomon, is also used on audio CD.

Life of a Computer Scientist

Saturday, February 26, 2011

Redundancy of audio packets over lossy network

1 comment: