Thursday, June 17, 2010

Lack of Lexical Context Closure Impediment to Abstraction

Lexical context (also called the environment) describes what identifiers need to be present as well as their types in order for a fragment of a program to run. Closure allows a first-class function to refer to its lexical context while being called in the lexical context elsewhere. This is useful for writing algorithms as a higher-order function whose behavior can be customized by a function argument. This is widely understood by functional programming community (LISP/Scheme, ML, Haskell) to be a very powerful abstraction tool.

It is only starting to be noticed by the C++ community. In C++, the std::for_each algorithm can be used to apply a function to elements in a container between two iterators. But only in C++0x can you use it without too much notational overhead.
// From Wikipedia: C++0x
std::vector<int> someList;
int total = 0;
std::for_each(someList.begin(), someList.end(), 
  [&total](int x) { total += x; }
);
std::cout << total;
The idea is that std::for_each would describe a specific kind of for-loop, and its usage makes the intent of the program more obvious. Note that the function argument (in bold) would be called inside std::for_each, which is in a different lexical scope than where the function argument is defined.

When compiling to machine code, a typical way to implement closure is to enclose the environment in a struct, let each function take a pointer to the environment as an implicit argument, and make the function pointer a pair of pointer to the code as well as pointer to the argument. In plain C, it would look like this:
typedef struct env {
  int total;
} env_t;

void anonymous_func(void *ctx, int x) {
  env_t *envp = (env_t *) ctx;
  envp->total += x;
}

typedef struct for_each_int_arg {
  void (*code)(void *, int);
  void *ctx;
} for_each_int_arg_t;

void for_each_int(int *first, int *last, for_each_int_arg_t f) {
  for ( ; first != last; first++)
    f.code(f.ctx, *first);
}

void sum(int *numbers, size_t n) {
  env_t env = { .total = 0 };
  for_each_int_arg func = {
    .code = anonymous_func,
    .ctx = (void *) &env,
  };

  for_each_int(numbers, numbers + n, func);
}
In fact, this technique is prevalent in C programs that take a callback in the form of a function pointer plus a context pointer. A profound implication for callbacks that do not take a context pointer is that the function argument to these higher-order functions can only refer to the global or static variable scope as its lexical context. Unless these functions do not effect external state (i.e. pure, effectless), this limitation makes it hard to encapsulate the state of the program into small reusable discrete units, and therefore a violation of abstraction.

Let's take a look at several C library calls and see what the implication means.
  • qsort(), bsearch(): the compar argument does not take context. It may be fine because compar is supposed to be pure, i.e. causing no side effects. However, neither qsort() nor bsearch() are very generic either; they assume the data to be laid out entirely in a C array. A more general version may require the compar function to take a context, for example when operating on the content of a huge file without reading the content entirely into memory.
  • atexit(): the function argument does not take context. If the exit routine is supposed to synchronize files, databases or remote processes, all the states would have to be global.
  • signal() and sigaction(): the signal handler does not take context, therefore they cannot be modularized. There are legitimate needs for modular signals, such as alarm and broken pipe detection. These will need to seek alternative workarounds.
  • pthread_key_create(): the destructor of a thread-specific key takes the thread-specific value as the context.
  • pthread_cleanup_push(): the cleanup routine takes a context.
  • pthread_create(): the start routine takes a context.
We can see that those library calls that don't take context in the callback can restrict the utility of these library calls.

In C++, the implicit this object pointer with a type parameter can be used together to compensate for the lack of context pointer, using a function object. The type parameter would specify a class that implements operator(), the function call operator. It will be given an implicit this pointer when called. This is how std::for_each worked when C++ did not support lexical context closure. You would have to rewrite the one line of anonymous function in C++0x into several lines of function object, incurring notational overhead that does not justify the axiomatization of a simple for loop construct.

Even so, there are cases where the lack of lexical context closure in C++ can impede abstraction and code reuse.

A while ago, I implemented a Splay tree where the pointer as well as the nodes are abstracted out.
template<class Ptr /* implements pointer<? implements binary_tree_node> */>
struct splay_tree /* implements binary_search_tree */ {
  template<class Compare /* implements unary_compare<Ptr> */>
  static ptr_type&
  find(Compare& cmp, ptr_type& root) throw() {
    ...
  }

  // More functions that operate on splay trees...
}
The idea is that whoever wants to use the Splay tree algorithm will write a node class that implements the binary_tree_node interface (provides accessor and mutator to left and right subtree pointers). The template argument Ptr is the pointer type to the node. The find() function takes a comparable object, which can either be a key or another node, which will be used to find a node in the tree that compares equal to the comparable object. The template argument Compare is the type of the comparable object.

Generalizing the pointer allows an alternative memory model to be used. In my case, I used the splay tree with linear_ptr<> which performs ownership tracking, so I know the algorithm preserves resource ownership. It has been used successfully this way. Another usage of alternative memory model is where a pointer is subscript into a particular array. This allows binary tree nodes allocated from a small array (say < 256 entries) to use just one byte for the pointer.

However, it doesn't work. We could try it like this:
template<typename Index, typename Tp, Tp *array>
class small_pointer {
 public:
  small_pointer(Index i = 0) throw() : i_(i) {}
  Tp& operator*() const throw() { return array[i_]; }
  Tp* operator->() const throw() { return &array[i_]; }
 private:
  Index i_;
};
A seasoned C++ programmer will correctly spot that array—a value of type Tp *— has to be a constant. It turns out that C++'s idea of constant means that the pointer must come from the memory location of a global or static variable. For example, you can't use small_pointer in a function like this.
void foo() {
  int numbers[10];
  for (int i = 0; i < 10; i++) {
    small_pointer<char, int, numbers> p(i);  // error!
    *p = i;
  }
}
C++ compiler will tell you that numbers cannot appear in a constant expression. However, if you think about it, after the code for operator*() is inlined, the statement *p = i would become numbers[i] = i, and it makes perfect sense. However, in the case that the C++ compiler decides to place opeartor*() into its own function, it would not have access to the int array because numbers is in another lexical scope. If C++ correctly supports lexical context closure, it would actually be able to accept any identifier as a pointer to a non-type argument of a template. You would be able to instantiate the small_pointer template class with a function scope identifier as well as a class scope identifier.

Without this abstraction, I would have to re-implement the Splay tree for tree nodes that want to use a compact pointer model because the nodes are stored in an array. This need might be more common than you think, given that embedded devices like Arduino is getting more popular these days. The difference between O(n) and O(lg n) operation can mean a lot to battery life, but O(n) algorithm is often being chosen due to the prohibitive complexity involved to get O(lg n) data structure working in a compact memory layout. I certainly do not look forward to re-implementing Splay tree for each possible compact node layout. This is something that the compiler should do for me.

No comments: