Friday, March 30, 2007

Unsafe String Concatenation in C

Many remote exploit vulnerability can be classified into two kinds: memory management problems and confusion of language domains. Memory issues are characteristic of programs written in a low-level language like C. Confusion of language domain is more characteristic of programs written in a high-level scripting language like Perl, PHP and Python. For networked applications that require performance, programmers tend to use a low-level language like C, so memory issues like buffer overrun has been a primary concern in this case.

Because buffer overrun has been Numero Uno cause of security issues, courses and programming books teaching C nowadays have (hopefully) delivered good practices of memory management. One example is to use string functions that limit buffer size. For example, use snprintf() instead of sprintf(), fgets() instead of gets(), strncpy() instead of strcpy(). These functions provide an argument to impose buffer usage limit in order to avoid overrun.

One may be inclined to also recommend strncat() in place of strncat() for string concatenation. However, this deserves closer scrutiny. This function has the following prototype:
char *
strncat(char * restrict s, const char * restrict append, size_t count);
It copies at most count characters from append to the end of s. However, if s is nearly filling up its buffer size, a count that is too large would still overflow the buffer that holds s. We can illustrate this using the following example:
char buf[16];
strcpy(buf, "hello world!");
strncat(buf, " adam!", sizeof(buf)); // wrong!
This code instructs strncat() to append at most 16 bytes, which is sizeof(buf), from " adam!" string. Obviously, strncat() ends up appending the whole string, and this overflows buffer by 3 characters because the string "hello world! adam!" is 18 characters in length plus the terminating NUL.

This is apparently a real problem that once plagued CUPS (printing system for Linux and Mac OS X) and Solaris ufsrestore (incremental file system restore).

Another easy to make mistake is the fact that strncpy() may not properly zero-terminate a string. For example:
char buf[4];
strncpy(buf, "love", sizeof(buf));
printf("buffer contains the string '%s'", buf);
The code would end up printing "buffer contains the string 'love...'" with some garbage. The reason for the garbage is that buffer is no longer properly NUL terminated. You're lucky if the garbage is relatively short. Trying to pass buf to a string processing function that expects NUL terminated string may cause that function to run forever, even for benign functions like strlen().

On the other hand, fgets() reads a string into buffer up to size - 1 characters and properly NUL terminates the string.

All in all, I think these C functions are examples of bad interface design. If someone is reworking the string functions in C, he should focus on making the interface consistent, namely:
  1. That strncat function, like everyone else, should accept destination buffer size limit, rather than limiting number of characters to append; and
  2. All string functions should result in valid, properly NUL terminated strings.
Furthermore, I actually encourage one to use strcat() and strcpy(), despite them being allegedly "unsafe." The rationale is that this forces you into the habit of always checking string length with strlen() and make sure you have enough buffer size to hold the resulting string before performing the operation.

There are additional gripes about strtok()/strsep()—they destroy the original string, which is a problem if the string is shared with another data structure or function. You should use strchr() or strcspn() to find the next occurrence of delimiter, and use strncat() to copy the delimited portion to an empty—but NUL terminated—string. The reason why you shouldn't use strncpy() is precisely that it does not NUL terminate your string.

You should consider strncat() as a safer substitute for strncpy() because strncat() at least guarantees NUL terminating the resulting string (string copying is a special case for concatenation). This also brings up an issue about how to initialize a string buffer. Instead of zeroing out every bytes in the buffer, you only need to zero the first byte.

We discussed a number of string processing functions and their interface irregularities, and we showed some workarounds for better security practice. This illustrates the amount of detailed attention one has to pay when writing C programs. I hope that students who first learn C would just pick up the good practice from the beginning.

No comments: