Friday 3 April 2009

Format string vulnerabilities

I was thinking about what to write about in my next post when C (my friend "C", not the programming language) happened to read the previous post. He asked me why I had used printf("%s\n", "Hello World") to print the string instead of the simpler printf("Hello World\n"), does it make any difference which version you use? The answer to the first question is "old habit" and the answer to the second is "not in this case". The second answer is the cause of the second answer and that will be the topic of discussion in this post.

printf belongs to the class of functions that take a variable number of arguments. It is of the form printf(fmt_str, ...). The first argument fmt_str is called the control or format string. This string contains specifications for the number and type of the following arguments. printf scans the string from left to right printing on the output device any characters it encounters except when it reaches a '%' character. The '%' character is a signal that what follows it is a specification for how the next variable in the list of variables should be printed. printf uses this information to evaluate the corresponding argument and prints the result after suitable formatting before moving on to the next character. While all that is well and good, the problem arises because the varargs mechanism that allows printf to accept arbitrary numbers of arguments trusts the format string to correctly specify the number of arguments and their types. It does not do any type checking itself. This shortcoming can be used by malicious users to print data from the stack or other locations and even to write arbitray data to arbitrary locations. A combination of such operations can by used to overwrite the address of a legitimate function or the return address on the stack with the address of some malicious piece of code.

Format string vulnerabilities usually appear when printing unfiltered user supplied strings. For example, if char* buffer contains a user supplied string, using printf(buffer) rather than printf("%s", buffer) to print the string may cause a vulnerability. printf(buffer) treats buffer as a format string and parses any format specifiers it may contain where as printf("%s", buffer) treats buffer as a literal string.

To return to C's original question, printf("Hello World\n") is as safe as printf("%s", "Hello World") because it is a literal string which is known beforehand. But, I have become so used to using printf("%s", string) to print any string that I automatically go for this form irrespective of the source of the data to be printed.

No comments: