Saturday, April 13, 2013

Notes about Interprocess Communication

Motivation

There are many distributed systems and IPC designs, but forget about them for a moment. Let's design an IPC from scratch. The basic idea is to take a procedure call like this:
char buf[64];
int len = snprintf(buf, sizeof(buf), "The answer is %d", 42);
And take snprintf() out transparently to a separate process. One reason for doing that is that we may have a buggy snprintf() implementation which crashes 1 out of 40 times, and we don't want snprintf() to crash our program. By taking snprintf() out to a separate process, we now also have the liberty to retry when that buggy code crashes. Process isolation means that the buggy code cannot corrupt our memory. We are now fault isolated.

Why do we want fault isolation? Let's say I program in ATS, and I proved that my program can never crash. But I need to use third-party libraries which are only as reliable as (fill in a witty expletive here), and that makes my program just as reliable.

Message passing

Before we can call snprintf() in an isolated process, the runtime system needs to answer a few questions.
  • Discovery: where do I find a process that implements snprintf()?
  • Message passing:
    • How do I request that snprintf() be called?
    • How do I retrieve the result from snprintf() when it is done?
For discovery, there are several ways:
  • Start the process as a child process yourself, and become its babysitter so that you'd restart it when it crashes.
  • Share this process with other processes, and let a system-wide babysitter handle program crashes.
    • This shared process might be on a different computer across the network. You use a name service to find it.
For message passing, first you need a transport, then a wire format. For the transport, you'd probably just open a socket. Local processes can use Unix domain socket, and remote processes can use TCP/IP socket. You can also use UDP if you want to handle potential packet loss yourself. You can also use TLS sockets if you want authentication and encryption. The transport might provide information like remote address and credentials for the purpose of authorization. Authorization is done by the program in order to decide whether to accept or reject the message.

The wire format consists of a frame and serialization of messages. You need a frame so the receiver knows how many bytes to expect. Framing might not be necessary for a datagram socket that already handles frames as part of the underlying protocol. Framing also might not be necessary if the underlying message serialization has an unambiguous length. For the frame, it suffices to just have a message length followed by that number of message bytes. You can use a varint to make the length flexible. If you use a fixed size integer, you're also deciding the effective maximum message size at this point. The message itself would be a byte sequence that serializes detail of the message.

For the purpose of IPC, the request message encodes the function name and arguments, and the response message encodes the return value. In the snprintf() case, the request message could be just ["snprintf", address_of(buf), 64, "The answer is %d", 42], and the response message might be [16].

Out of band communication

Here is an interesting problem. If snprintf() is now in a separate memory isolated process, how could it modify our buffer? Some IPC design forces snprintf() to return the resulting string as part of the response message. This might work okay if the result size is small. If the result consists of mostly large binary array, e.g. a pixel buffer, then the overhead to encode, copy over transport, and decode would be too high. In this case, it is better to use message passing as the "control channel" and use an optimized out of band communication method to deliver large binary objects.

For example, if two processes are on the same machine, then the out of band communication could take place over shared memory. If two processes are on different machines, then they could use remote DMA.

In the case of snprintf, we might have to modify the program like this:
blob_t *buf = allocate_blob(64);
int len = snprintf(buf, sizeof_blob(buf), "The answer is %d", 42);
/* do something with buf */
free_blob(buf);
The "blob" indirection can be inserted by the compiler transparently.

Parallelism

IPC calls can be annotated with Cilk-like spawn and sync so that they may take place in parallel. Indeed, a parallel program is I/O bound when it's waiting for results from a different process.

Other optimizations

  • Amortize the cost of transport establishment by reusing existing connections across function calls.
  • Amortize the cost of out of band communication establishment across the creation and modification of blobs.
  • Channel compression.
  • Scheduling.

Conclusion

This is just an overview of an interprocess communication designed from scratch. Distributed computing has been heavily researched. I took inspiration from some existing designs but used the motivating example as a guideline to eliminate designs that are irrelevant.

No comments: