Sunday, February 15, 2015

Extended Datagram Protocol

I need a datagram protocol as a basis for transporting RPC over a network. I'd like to call it Extended Datagram Protocol or XDP.

Here are the requirements:
  • Message size will be larger than MTU up to ~128KB. Most networks support Ethernet MTU of 1500 bytes, which means large packets need to be fragmented. IP fragmentation handles packet loss poorly. If a single fragment is lost, the whole packet will be dropped.
  • The fragments are sent over UDP. XDP will handle packet assembly with forward error correction. Each fragment will have at most 1280 bytes of payload. It is a nice even number and leaves handsome room for the headers.
  • The RPC layer will allow messages to arrive out of order, so the datagram protocol does not need to worry about packet ordering.
  • The datagram protocol does not need to implement message acknowledgement. It may be handled by the RPC layer, or even the application layer.
  • Encryption will be done at the RPC layer.
  • Session management will be done at the RPC layer.
I don't have a concrete design yet, but I think it is worth mentioning the rationale behind these requirements. There are a number of proposals that extend UDP such as Reliable User Datagram Protocol and Stream Control Transmission Protocol that don't fit the bill. In-order delivery of packets actually becomes a head-of-line blocker, such is what QUIC tries to avoid.

QUIC (Quick UDP Internet Connections) features packet level error correction, but it is trying to do a lot more such as flow control and encryption. I think flow control should be handled by a layer above (e.g. RPC layer) which can do flow control not only based on network condition but also server load. I don't mind that it handles encryption. However, crypto negotiation packets can easily be larger than the MTU, so I think it would be easier to implement encryption with proper large message support provided by a simpler datagram protocol. I'm not trying to argue that QUIC should be designed differently, only that I think my requirements are different and not well-addressed.

An important design philosophy I have is that we need to share protocol implementation as much as possible, perhaps even across layers. TLS basically invented its own RPC protocol, and it uses X.509 which uses the ASN.1 message serialization scheme that differs from TLS. If I were to layer my own RPC on top of TLS there would be three protocol implementations.

The reason Extended Datagram Protocol does not handle anything besides message fragmentation is that everything else could really be handled by the RPC layer that has a strongly-typed and unified message serialization/deserialization scheme. The only difficulty that message serialization has trouble dealing with is the framing. If we were to send a message over a stream based protocol like TCP, we would first send the message length, then the message body. The sender could do this in one socket write, but the receiver cannot do this in a single read. The receiver would have to first read the message length, then the message itself. The receiver could speculatively read extra bytes in one socket read, but it would need a way to pass the excess data to the next receiver, often complicating the design of the protocol stack.

No comments: