Monday, October 18, 2010

C++ Overloading operator<< for std::ostream

The operator<< is used to stringify an object and "shift" the result to an output stream, such as std::cout for standard output. It is used like this:
#include <iostream>
int main() {
  std::cout << "hello world" << std::endl;
  return 0;
This of course only works for values of types for which there is an overloading of the operator<<. If you want to implement serialization for a new class, you will have to implement an operator<<() in the global scope, and cannot do so in the class scope. That is because the left hand side of << is an std::ostream& object, not your object.

For example, this is right:
class Foo {...};

std::ostream& operator<<(std::ostream& os, const Foo& foo) {
But this will not result in the intended notation:
class Foo {

  std::ostream& operator<<(std::ostream& os) {
    // stringify this object to output stream os.
This means that a statement like foo << std::cout will stream foo to std::cout, but the notation is now wrong. You can overload operator>> instead, but this notation has other issues we will not discuss here.

Sometimes we do not want to overload operator<< globally, at least we don't want to pollute the global namespace with a different operator<< for each serializable class. We can do this by introducing an indirection layer with subtyping polymorphism.
class serializable {
  virtual void serialize(std::ostream& os) const = 0;

std::ostream& operator<< (std::ostream& os, const serializable& s) {
  return os;

class Foo : public serializable {
  virtual void serialize(std::ostream& os) const {
    os << "Foo" << std::endl;
Then we can simply say:
int main() {
  Foo f;
  std::cout << f;
  return 0;
We implement exactly one global operator<< trampoline, which calls the serialize() method of a serializable abstract base class. Any subclass of a serializable that implements the serialize() method can now also be used with operator<< automatically.

If you don't want to use virtual table, a similar technique using Curiously Recurring Template Pattern also works.
template<class Derived>
class serializable {
  void serialize(std::ostream& os) const {
    static_cast<const Derived *>(this)->serialize(os);

template<class Derived>
operator<< (std::ostream& os, const serializable<Derived>& s) {
  return os;

class Foo : public serializable<Foo> {
  void serialize(std::ostream& os) const {
    os << "Foo" << std::endl;

We can similarly define unserializable class with operator>> analogously. This is left as an exercise for the reader.

Saturday, October 16, 2010

A rather unfortunate IP packet corruption

$ dig

; <<>> DiG 9.4.3-P3 <<>>
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 48565
;; flags: qr aa rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

; IN        A

;; ANSWER SECTION: 457319 IN CNAME wwwap`

;; Query time: 15 msec
;; WHEN: Sat Oct 16 10:55:53 2010
;; MSG SIZE  rcvd: 92
Barring the ridiculous time to expire 457319, apparently the CNAME record above should read The is used as a CNAME (after many indirections) to, which is used to track packages.

Friday, October 8, 2010

A test on gcc's ability to optimize

I have an implementation of some singly linked data structure which I wish to adapt for owner reference. An owner reference is like a "parent" pointer in the case of a binary search tree, or the "previous" pointer in the case of a doubly linked list, i.e. a back pointer. However, the owner reference does not point to a node.

To understand what the owner reference is, I've written about tracking object ownership before, where we use a linear pointer to keep track of who is the owner of that object. Each object would have exactly one pointer that points to the object, and whoever has that pointer is the owner. The owner is responsible for disposing the object.

The principle problem with linear pointer is that it cannot be used to implement doubly linked data structure by definition, since doubly linked data structure implies that a node could have two incoming pointers. In a situation like that, only one pointer can be the owner. The other pointer must be a loan.

You can implement a doubly linked data structure with a linear pointer and a loan pointer, but we have to realize that the chief reason to have doubly linked data structure is so we can easily remove a node, knowing only its address, in O(1) time without needing to look up where it is in the data structure.

The owner reference in a node is a device to tell the address of the linear pointer that is the node's owner. With owner reference, it is possible to borrow an object and then steal ownership of it from the previous owner.

Using owner reference is preferred in case of binary tree node that requires in-place removal. If we use a parent node pointer, we would need additional logic to figure out whether the node we want to remove is the left or the right child of its parent. With owner reference, we don't care; the parent could even be the root pointer. The owner reference leads us right to the correct pointer that we need to modify.

However, the owner reference is additional book-keeping that needs to be updated whenever we use another routine to change the structure, such as list reversal. In the case of binary search tree, I would need to quickly augment a splay tree I implemented before with owner references. Obviously, owner reference changes whenever a linear pointer move occurs, and in C++ I could fortify the assignment operator of my linear pointer to take care of that for me. This would allow the same code that previously dealt only with singly linked data structure to become owner reference aware. However, the code could be doing extra work.

To illustrate, suppose we have a move function that supports owner reference update when doing assignment.
class node {
  node *next() const throw() { return next_; }
  node *& next() throw() { return next_; }

  node **owner() const throw() { return owner_; }
  node **& owner() throw() { return owner_; }

  node *next_;
  node **owner_;

void move(node *& to, node *& from) {
  to = from;
  to->owner() = &to;
  from = 0;
The move assignment updates the pointers linearly and keeps the owner reference updated. Once you move ownership from one pointer to another, the original pointer would lose the pointer to the node, and that's why it's filled with a NULL pointer, or 0. The owner would be the new "to" pointer.

Now consider the swap function.
void swap_naive(node *& a, node *& b) {
  node *tmp;
  move(tmp, a);
  move(a, b);
  move(b, tmp);
Notice that we can achieve the same effect with much less work.
void swap_fast(node *& a, node *& b) {
  node *tmp = a;
  a = b;
  b = tmp;
  a->owner() = &a;
  b->owner() = &b;
The question I ask in the subject of this post is, if we write swap_naive, is GCC smart enough to emit code in the fashion of swap_fast? If it can, then I could merrily overload the assignment operator of an augmented linear_ptr<> class to update owner references. However, this means I will not be able to write swap_fast() directly. If not, that means I'll need to manually optimize away the redundant owner reference updates.

I compiled with g++ -S -O3 -fomit-frame-pointer with different gcc versions, and here is the result for the number of essential instructions (not counting boilerplate context saving and return) for either function.
Compiler (output)swap_naive()swap_fast()
gcc-4.4.1 -march=i68611 (9)9 (8)
gcc-4.4.1 x86_6477 (6)
apple gcc-4.2 -arch i6861510
apple gcc-4.2 -arch x86_64118
apple gcc-4.2 -arch ppc128
apple gcc-4.2 -arch ppc64128
The reason I decided to do x86_64 and ppc/ppc64 is because these architectures have more register count than i386, and therefore is able to retain more data in the register rather than needing to spill data to the stack frame.

It is definitely surprising how gcc-4.4.1 on x64_64 can achieve the same number of instructions for both swap_naive() and swap_fast(), although you can see from the assembly listing that the naive version does assigns zero to the next_ pointer. And closer inspection in the assembly of swap_fast() indicates that the movq (%rdi), %rdx instruction is redundant because we had movq %rdx, (%rdi) before. The compiler could have saved us one more instruction, but the optimization has a bug.

Regarding optimization bug, a closer inspection on gcc-4.4.1 on i686 assembly code reveals that the subl  $16, %esp and addl $16, %esp instructions are redundant. This means, with proper optimization, we could bring instruction count for swap_naive() down to 9 as well. Again, in swap_fast(), the movl (%eax), %ebx instruction is redundant, which brings the instruction count down to 8.

The compiler optimizer is indeed impressive, but not without flaws.

After this study, I determined that it is better to not blanket overload the assignment operator. I would modify singly linked routines to use move(), which is identical to simple assignment in singly linked data structures. However, for augmented linked structure, it will be possible to manually optimize some of these routines further by not using move(). For example, I could leverage a singly linked list sort() to sort an augmented list with owner reference, but I only relink the owner reference after sorting is done. This would not be possible had assignment operator been overloaded.

Friday, October 1, 2010

Starting daemon process using screen

Here is a good way to start a daemon background process using screen.
screen -d -m -L command args...
This tells screen to create a new session (-m), log its output (-L) and detach immediately (-d), which the command args... is run. The log is written to screenlog.n in the current directory, where n is the "window" number inside the screen session. The logging is buffered and flushed periodically, and the log file can be logrotated.

The screen session supports process control, i.e. stopping a running daemon. To do that, the screen session must be started with a session name (-S sessionname), and it can be killed later with the name.
screen -S pingsession -d -m -L ping localhost
sleep 20
screen -r pingsession -X quit
The advantage is that the daemon process does not have to do anything to turn itself into a daemon or to support process control. A list of screened daemons can be shown using screen -list command.