Monday, June 21, 2010

C++ STL std::vector capacity() Not A Substitute For size()

When you use the std::vector<> template in C++, one thing you notice is that the actual capacity(), the number of elements allocated for the internal array, may be different than size(), which is the number of items currently stored by the user. In general, size() <= capacity(). The capacity() can change when reserve() is called, and reserve() can be called by insert() or resize().

When accessing the vector using operator[], the index is not checked against valid bounds. The at() accessor defines valid bounds to be 0 <= i < size(), where i is the argument to at(), and the domain applies to operator[] as well. If you don't check for bounds explicitly, it should at least be an invariant of your code.

It is worthy to note that when size() <= i < capacity(), although your program technically does not access illegal memory (memory checker like Valgrind would not be able to detect it), the elements at those i's are strictly undefined, for the following reasons.
  • When the next resize() expands the size(), it will attempt to overwrite the elements in the internal array from the previous size() up to the new size().
  • insert(end(), ...) will overwrite the elements in the internal array after size().
  • Even if you avoid using resize() and never uses insert(), and rely solely on reserve() and operator[], there could still be a problem when reserve() causes reallocation of the internal array: only up to size() elements are copied from the original array to the new array.
In other words, the elements in the internal array from size() through capacity() are ignored by the vector and assumed to be non-existent. A symptom is that elements with values previously stored by operator[] gets erased to the default, or some other value specified for resize(). It is a bad idea to use capacity() for bounds checking. The proper bounds are defined by size().

No comments: