Life of a Computer Scientist: Zero-Latency Computer Audio Effect Notes

Saturday, April 21, 2012

Zero-Latency Computer Audio Effect Notes

When it comes to audio processing, it is believed that the human ear can detect at least 13ms of delay. In a computer audio setting, a large delay can be attributed to the underlying operating system. For example, it could be that the OS or driver does not handle interrupts (making input data available) fast enough, not scheduling threads to process the data soon enough, and making too many copies of audio data which delays the input as well as output. On Windows, the latency could be as high as 100ms. On Linux with preemptible and low-latency kernel, latency lower than 1ms is achievable. (These numbers are pulled from memory, and if any reader cares to point me to some latency measurement results—more recent the better—that would be very welcome).

But beyond the operating system, computer audio latency is also limited by the sampling rate as well. For example, when recording stereo audio at 44100Hz with 16-bit samples, filling a 1024 byte buffer takes 6ms. A buffer size for sub-millisecond latency would need to be 128 bytes, which is about the size of a memory cache line. This translates to about 32 samples. A very small number for audio effects processing. For example, computing Fourier Transform on a block size this small is not very interesting. This means that in order to achieve zero-latency computer audio effect, the algorithm has to be designed as a stream of samples.

Here I'm just noting two algorithms that are streamable: low-pass and high-pass filters. Both algorithms have a “discrete-time realization” (taken from Wikipedia) as follows:

Low-pass filter: \[ y_i = \alpha x_i + (1 - \alpha) y_{i-1} \qquad \text{where} \qquad \alpha \triangleq \frac{\Delta_T}{RC + \Delta_T} \]
High-pass filter: \[ y_i = \alpha y_{i-1} + \alpha (x_{i} - x_{i-1}) \qquad \text{where} \qquad \alpha \triangleq \frac{RC}{RC + \Delta_T} \]

In both cases, the cutoff frequency is \( f_c = \frac{1}{2\pi RC} \), and \( \Delta_T \) is the duration of time between samples (the reciprocal of the sampling rate).

Saturday, April 21, 2012

Zero-Latency Computer Audio Effect Notes

No comments: