Saturday, April 2, 2011

Factories and rvalue reference

One of the benefits of rvalue references is that it provides perfect forwarding for libraries. One example is the factory. A factory here is taken in a loose sense to mean any function (as well as a constructor) that has to construct another object using a non-default constructor, taking the initializer from the function's argument. An example factory is shown below.
template<class T>
T *factory(const T& x = T()) {
  return new T(x);
}
Also, the constructor of std::pair is a factory.

A typical use of the factory is to pass in temporary object as the initializer. For example,
std::complex *p = factory(std::complex(1.0, -1.0));
One issue that factories face when the initializer retains resource is what to do with the resource. The resource could be a memory location, a file in the filesystem, or anything whose explicit ownership needs to be established or resource sharing and leak would be an unwanted consequence.
  • Sharing the resource is out of the question, since this would violate ownership invariant.
  • Stealing the resource may be desirable but not possible, since x is of type const T& which means you cannot modify x.
  • Copying the resource could be expensive, for example if x contains a string or a vector with a lot of data.
Part of this issue could be mitigated by the copy constructor of T itself. It could be the case that T only wants to copy some defining characteristics from the initializer and not the data. For example, if T represents a file, then only the file permissions are copied but not file content; or if T is a string buffer, then only the open mode is copied but not the content of the buffer.

Note that std::stringstream does not have a copy constructor, so we're only talking about a hypothetical class. In the case of std::stringstream, most STL implementation probably did not prohibit copy constructor, so the compiler automatically generates one that could end up performing shallow copy of the underlying buffer. This happens to work for temporary initializer like this:
std::stringstream *p = factory(std::stringstream(...));
But if the initializer is an lvalue, then sharing of the underlying buffer would be undesirable.
std::stringstream ss(...);
std::stringstream *p = factory(ss);
Depending on what stringstream's copy constructor does, *p and ss might end up sharing the same buffer, and mutating ss or *p would make the internal states of the other object inconsistent.

With the introduction of rvalue references, we can now give a strong hint to the factory that the internal resources should be transferred.
template<class T>  // factory using copy semantics
T *factory(const T& x) {
  return new T(x);
}

template<class T>  // factory using move semantics
T *factory(T&& x) {
  return new T(std::forward<t>(x));
}
Note that without the std::forward<T>, a named rvalue reference is treated as an lvalue reference when used [N1377], and we would end up calling the non-const lvalue reference constructor (i.e. T(T& x)) or fall back to copy constructor (i.e. the const lvalue reference constructor).

Revisiting the factory using scenario above,
std::stringstream *p = factory(std::stringstream(...));
would use the move semantics rvalue reference factory, and
std::stringstream ss(...);
std::stringstream *p = factory(ss);
would use the copy semantics const T& factory.

This is consistent with N1377 move proposal which says that the most common anticipated overload to be,
void foo(const A& t);  // #1
void foo(A&& t);       // #2
Note that the factory examples in rvalue reference proposal [N1690] do not have a const T& overload. This means that the factory will only steal resources because of the move semantics. And that factory(ss) above will steal the underlying buffer of ss and render it unusable.

The conclusion is that factory must overload both const T& and T&&.

This actually works nicely with existing libraries. In pre-C++0x code, the factory will always copy. A preprocessor macro would detect the availability of C++0x features and add the rvalue reference version to support move semantics in factory.

One last word about regular functions with rvalue reference arguments. In N1377, regarding binding temporaries to references, it is mentioned that a value could be implicitly converted to a temporary, and this is why passing a temporary of T to T& is disallowed.
void incr(int& rr) {rr++;}
 
void g()
{
    double ss = 1;
    incr(ss);
}
Here, incr(ss) implicitly converts ss to an int, and increment is done on the temporary int value instead. This is properly disallowed in C++. However, in the case of rvalue reference, it is a good idea to make the constructor explicit to disallow implicit conversion.
class Foo {
 public:
  explicit Foo(Bar b);  // regular constructor
};
Otherwise, a function expecting Foo&& for argument but given a Bar object would implicitly convert Bar to Foo and operate on the temporary Foo. Furthermore, if Foo also has a rvalue reference constructor taking Bar&&, then the temporary Foo would steal all internal resources of Bar, get passed to the function, and self-destruct. In other words, implicit conversion from Bar to Foo would destroy Bar.

This is likely going to be a very common mistake when more people misuse rvalue reference. On the other hand, it would be wise to limit the use of rvalue reference and never use it like lvalue reference even if you declare the constructors to be explicit. In particular, rvalue reference may or may not modify the argument, so it should not be used as an "out" argument.

No comments: