Thursday, January 11, 2018

C++, the cause and solution to life's pimpl problems


Can a programming language invite programmers to write anti-patterns that makes them write more code that is also harder to follow? Certainly, it can.

PIMPL is a much hyped code pattern in C++ to separate a class's public declaration from the implementation detail by wrapping a pointer to the implementation object in the public class. This is what the user of the public class sees. The pointer is the only member of the public class.
// In xyzzy.h

class Xyzzy {
 public:
  Xyzzy();   // Constructor.
  ~Xyzzy();  // Destructor.

  void Foo();

 private:
  class Impl;  // Incomplete forward declaration to the implementation.
  Impl *pimpl_;  // Pointer to incomplete class is allowed.
};
The implementation would be written like this, hidden away from the user.
// In xyzzy.c

class Xyzzy::Impl {
  ...
  void Foo() {
    // Actual code.
  }
  ...
  // All private members are declared in the Impl class.
};

Xyzzy::Xyzzy()  // Constructor.
  : pimpl_(new Impl) {}

Xyzzy::~Xyzzy() {  // Destructor.
  delete impl_;
}

void Xyzzy::Foo() {
  impl_->Foo();
}
There are some elaborate concerns like constness preservation, use of smart pointers, copying and moving, memory allocation, and generalizing the boilerplate to a template, but I want to take a step back and look at what PIMPL accomplishes, and why it is necessary.

The main reason for using PIMPL is to reduce compilation dependencies, so that changes made to the implementation class would not force the user of the public interface to have to recompile code. Recompilation is necessary when the change breaks binary compatibility. In C++, a change could break binary compatibility for banal reasons like adding a private member to a class.

When class members are added or removed, the size of the class would change. In C++, it is the caller's responsibility to prepare memory storage for the object, regardless whether the object is stack or heap allocated. Without recompilation, the storage might be too small. Bigger storage is not a problem, but the compiler has no way of telling the old size. With PIMPL, the size of the public class is always just a single pointer to the implementation object, so it stays the same. It shifts the responsibility of storage preparation from the user to the implementation.

Another type of change that warrants recompilation is when virtual methods are added or removed, which changes the ordering of function pointers in the vtable. Without recompilation, the old code would end up calling the wrong virtual method since the index changed. PIMPL would not be able to alleviate this type of recompilation; adding or removing non-virtual methods from the public class would still require recompilation under PIMPL.

One might ask, instead of PIMPL, why not use the abstract base class with a static constructor? It would look something like this for the public interface.
// In xyzzy.h
class Xyzzy {
 public:
  static Xyzzy *New();  // Static constructor.
  virtual ~Xyzzy();  // Virtual destructor.
  virtual void Foo();

 protected:
  Xyzzy();  // Not to be constructed directly by user.
};
And the implementation:
// In xyzzy.c
class XyzzyImpl : public Xyzzy {
 public:
  void ~XyzzyImpl() override {
    ...
  }

  void Foo() override {
    ...
  }

  // Other members.
};

Xyzzy *Xyzzy::New() {
  return new XyzzyImpl;
}
The problem is that the abstract base class forces all methods to be virtual, and virtual method calls are more expensive because it's harder to do branch prediction with function pointers. The responsibility to manage storage is shifted to the static constructor, so adding or removing members to the base class shouldn't affected binary compatibility. Even so, it still requires recompilation in practice, so members should be declared in the implementation class only.

The common theme here is that any change to a class in C++ requires recompilation, period. That's because the compiler can't tell what changed, so it can't tell whether the change may or may not affect binary compatibility. Contrast this with C, where user of a struct never has to know its size, without relying on virtual methods.
// In xyzzy.h

struct xyzzy;  // Incomplete type.

struct xyzzy *new_xyzzy();              // Constructor.
void delete_xyzzy(struct xyzzy *this);  // Destructor.
void xyzzy_foo(struct xyzzy *this);     // Method foo().
Although most build systems would still recompile the translation unit that includes xyzzy.h for posterity, it doesn't have to if the changes are binary compatible. This is why an executable compiled with an older header could still be dynamically linked with a newer version of the shared library without recompilation, and it would still work.

In the end, I think any effort to reduce recompilation for C++ code is futile. C++ is inherently designed for whole-program compilation. There are other reasons, like how template instantiation requires the implementation source code to be available. But any class would require the user to know its size, and one has to go through great lengths to hide that from the user in order to ensure binary compatibility.

Clearly, PIMPL is an anti-pattern that is motivated by the way C++ is designed. It's a problem that C never has to worry about.

No comments: