Saturday, April 21, 2012

Zero-Latency Computer Audio Effect Notes

When it comes to audio processing, it is believed that the human ear can detect at least 13ms of delay. In a computer audio setting, a large delay can be attributed to the underlying operating system. For example, it could be that the OS or driver does not handle interrupts (making input data available) fast enough, not scheduling threads to process the data soon enough, and making too many copies of audio data which delays the input as well as output. On Windows, the latency could be as high as 100ms. On Linux with preemptible and low-latency kernel, latency lower than 1ms is achievable. (These numbers are pulled from memory, and if any reader cares to point me to some latency measurement results—more recent the better—that would be very welcome).

But beyond the operating system, computer audio latency is also limited by the sampling rate as well. For example, when recording stereo audio at 44100Hz with 16-bit samples, filling a 1024 byte buffer takes 6ms. A buffer size for sub-millisecond latency would need to be 128 bytes, which is about the size of a memory cache line. This translates to about 32 samples. A very small number for audio effects processing. For example, computing Fourier Transform on a block size this small is not very interesting. This means that in order to achieve zero-latency computer audio effect, the algorithm has to be designed as a stream of samples.

Here I'm just noting two algorithms that are streamable: low-pass and high-pass filters. Both algorithms have a “discrete-time realization” (taken from Wikipedia) as follows:

  • Low-pass filter: \[ y_i = \alpha x_i + (1 - \alpha) y_{i-1} \qquad \text{where} \qquad \alpha \triangleq \frac{\Delta_T}{RC + \Delta_T} \]
  • High-pass filter: \[ y_i = \alpha y_{i-1} + \alpha (x_{i} - x_{i-1}) \qquad \text{where} \qquad \alpha \triangleq \frac{RC}{RC + \Delta_T} \]
In both cases, the cutoff frequency is \( f_c = \frac{1}{2\pi RC} \), and \( \Delta_T \) is the duration of time between samples (the reciprocal of the sampling rate).

Saturday, April 14, 2012

A proposal for remote procedure call IDL

This evening I decided to give some thought about a good RPC design, with a simple semantics and efficient wire format. There is a lot of prior art, e.g. SUN RPC, SMB, CORBA, SOAP, XMLRPC, and I don't want to go into the detail of any one of them. The XML based wire format is not efficient. The ASN.1 or XDR based wire format are fine, but JSON is simpler. It also has an efficient wire format called MessagePack. However, the RPC specification that came with MessagePack isn't satisfactory. One prominent missing feature is the streaming RPC. Its IDL is also tied to the underlying data representation. What I'm looking for is a simply typed way to describe a JSON value. The overall RPC semantics should be like simply-typed JavaScript.

Here is an informal proposal for the type of a simply-typed JSON value.

key-type := number | boolean | string | enum-type
key := number-value | boolean-value | string-value | enum-value
opt-key := key | optional key

type := key-type | object
      | [type]   // array
      | (type1, type2, ..., typen)   // tuple
      |    // associative map
        {opt-key1: type1, opt-key2: type2, ..., opt-keyn: typen}
      | nullable type

The type describes a JSON object. It can be:
  • One of the primitive types "number," "boolean," or "string," or it could be simply "object" for any JSON object without more refined type description.
  • The array syntax is for describing an array where all items are of the same type.
  • The tuple is for describing a sequence of objects of various types, even though in JSON it would be represented as an array as well.
  • The associative map is a key-value mapping from a concrete value of the key to a type. This is the refined description of a complex JSON object. The concrete key values can be a number, a boolean, or a string. The key values can be optionally prefixed by "optional" to indicate that the key could be missing.
  • A type with the "nullable" qualifier which indicates that the value could be null.
Furthermore, for the ease of defining constants or symbolic names of associative map, we allow the user to define enums which map a symbolic name to a number. The enum type's name could be used wherever a type is expected, and the enum symbols could be used whenever a key is expected.

An interface is a collection of function calls with the signature:

typearg -> typeret throws type1, type2, ..., typen

where typearg is the type of the function argument, typeret is the type of the return value, and the list of types after "throws" are the exception types that could be raised by the function.

What is unique about this proposal is the unified treatment of bidirectional and streaming RPC. Typically, streaming RPC is a way for the server to send data in multiple responses, asynchronously back to the client. Client can always stream data to the server by making a call to server as many times as necessary. Here the streaming RPC is generalized to the concept of a co-interface. Whenever server needs to make a part of the response available, it would invoke one of the functions in the co-interface. A co-interface is supposed to be implemented by the client for receiving calls from the server, in order to connect to the server. Hence, the complete service description consists of a server-side interface and a client-side co-interface, which is how bidirectional RPC can be made.

Update: June 1, 2012. I've decided to get rid of exception (throw) and instead use cointerface for signaling alternative continuation.