Friday, March 12, 2010

Use try-finally to prevent resource leak

This is not a tutorial, but a critique of a programming language feature known as try-finally.

Mom always tells you to clean up after yourself, or to put things back to where it was when you're done with it. If you don't do that, your room becomes a mess, and your stuff gets lost. A good programmer also follow a similar rule: if you acquired the ownership of an object, you have to release it. A program that doesn't follow the rule leaks resource every here and there, and it consumes more and more resource without being able to reuse them. Sharing might be a virtue in human etiquette, but it makes resource tracking difficult, and poor tracking will eventually lead to resource leak, both in the real world and inside the computer.

The most prominent kind of resource leak is memory leak, and it is the kind of leak most visible to the user thanks to how Task Manager (or top/ps in Unix) shows you the amount of memory used per program. The symptom is that the program uses more and more memory over time, until the operating system can no longer supply its insatiable gluttony, at which point the program crashes in a nasty way. (Greedy humans will eventually crash in a similar way—a forewarning to people who wanted to work for Google in order to indulge in its unlimited supply of gourmet food).

But even if a programmer is in good faith to clean up after himself, some programming languages make it harder than others to do it. For example, the following construct arises often in a non-trivial program:

thing_t *obj = acquire_thing(...);
...
if (...) {
  ...
  release_thing(obj);
  return ...;
} else {
  ...
  release_thing(obj);
}

The reason we could not factor release_thing(obj) out of the two branches of if-then-else is because the then-clause has a control flow that escapes current scope—the return statement, which could also be a continue or break statement, and our discussion still applies. Fortunately, in this case, the code is manageable because:
  • The clean up only requires releasing one object.
  • There are only two conditional branches, then and else.
Anything more than that, the code starts to become unmanageable.

Some languages make the code easier to write by providing the try-finally construct, so we could rewrite our code like this:

thing_t *obj = acquire_thing(...);
...
try {
  if (...) {
    ...
    return ...;
  } else {
    ...
  }
} finally {
  release_thing(obj);
}

The finally block is executed whenever the control flow escapes the try block. This allows us to write the clean up code only once. You're not required to use exceptions in order to use try-finally.

If your language provides this syntax, the finally statement is the only safe place you put any clean-up code. This is a very useful facility to combat any type of resource leak, especially memory leak. However, taking a look at languages that feature exception handling, many of the languages with finally support are garbage collected, such as C#, Java, JavaScript, Python, Ruby. In particular, C# and Java programs tend to put clean-up code in the object destructor, but garbage collection prevents you from explicitly destroy an object at a precise moment of time. The clean-up code is delayed for an unspecified duration, until the collection cycle kicks in. Rather than using try-finally, you might as well leak the garbage and wait for its collection. This renders try-finally useless for C# and Java—they have no business implementing try-finally in their language.

The only manual memory managed languages with try-finally support are Delphi (Object Pascal) and Objective-C. Try-finally is absent from the C language that is in dire need of the feature.

An interesting exception is C++, which implicitly calls an object's destructor when the control flow leaves the scope the object is declared in, known as resource acquisition is initialization. If we have a pointer to an object allocated in heap using operator new, we could use scoped_ptr<> to delete the pointer when the object leaves the scope. Apparently auto_ptr<> in STL also works, where the copy and assignment semantics moves object ownership from one pointer to another. Some people prefer to outright disallow copying and assignment, so they use scoped_ptr<> instead.

Although scoped_ptr<> and auto_ptr<> facilitates memory resource clean-up, they are not usable with other kinds of resources such as file descriptors and locks, which need other types of clean-up code. Also, any C library managed resource such as graphics context need their own clean-up code. This is why, in C++, it is fashionable to wrap C system or library calls as C++ objects. This requires a humongous start-up effort to want to use C++ along with a C library!

Finally (finally!), I conclude with two recommendations for the language design committee:
  • The C language needs try-finally, even if it does not need to support exception handling.
  • The C++ language also needs try-finally even if it does have exception handling syntax. This makes interoperability with resources managed in the C language easier.

No comments: