Avoiding Buffer Overruns with String Safety

原创 2004年10月11日 20:58:00

String handling is one of the most error-prone aspects of programming in C and C++. Errors in dealing with strings account for most of the buffer overruns that result in security problems. In many languages, a string is an elementary type, and several of the issues that cause problems in C and C++, such as buffer overruns and problems with illegal pointers, don't occur as easily in these other languages. Perhaps if C had been written with a string type, we might have fewer problems with strings.

Let's examine strings and take a look at three C library calls that can compromise the security of your code. Don't despair, I'll also introduce you to the Standard Template Library (STL) and explain how it can help you avoid some of these security vulnerabilities in your code. As I pointed out last time, the contents of this column assume that the reader has a basic familiarity with programming in C.

What's a String?

A string is a series of characters ending with a null (‘/0’) character that lets the program know where to terminate the string. A Unicode string is a series of wide characters (WCHAR) that also terminates with a null character. At the lower levels (e.g., kernel level) of Windows 2000 (Win2K) and Windows NT, a UNICODE_STRING type often represents strings. This structure maintains information about the length of the string and the maximum size of the buffer. Dealing with kernel-level code is beyond the scope of this article, but you should be aware that this approach represents another way of string handling. Almost without exception, the C library calls, which deal with single-byte characters, have equivalents to properly deal with Unicode strings, and the same pitfalls apply to both single-byte and Unicode strings. Let's begin by examining some of the available library calls, starting with strcpy().

Fun with strcpy()

The first library call, strcpy(), is defined as

char* strcpy(char* dest, const char* src);

A quick look at how C and C++ implement this function and a little thought about what parameters aren’t passed into this function gives a good view of the problems that can occur. What happens if src of dest is null? Ker-boom, it throws an unhandled exception or overwrites the stack. What if the string that src points to is longer than the dest buffer can hold? You'll overwrite past the end of the dest buffer, and if dest is a static buffer, declared on the stack like buf in the following example:

//this is the wrong way
void foo(char* inp)
{
    char buf[25];

    strcpy(buf, inp);
    //do more processing of buf
}

Buffer overruns are the sort of thing an attacker loves to find in your code. Once inp fills up buf, it starts overwriting the stack and can usually cause your program to execute whatever code the attacker wants. Consider a related problem: What if inp isn’t null-terminated? Now, our not-very-bright strcpy() function takes everything in inp and stuffs it into buf, past the end of buf, and keeps going until it triggers an exception handler. For these reasons, many programmers ban strcpy() from their applications.

Fortunately, you can improve the situation and still use strcpy():

void bar(char* inp)
{
    char buf[25];

    //first check to see if inp is illegal – if you don’t do this, strlen
    //call below blows up
    if(inp == NULL)
    {
        assert(false);
        printf(“Cannot process a null pointer!!!/n”);
        return;
    }

    //use <, not <= - that way you have room for a termination character
    if(strlen(inp) < sizeof(buf))
    {
        strcpy(buf, inp);
        //do more processing
    }
    else
    {
        printf("Hey! That string is too long!/n");
    }
}

The first thing you need to do is determine whether inp is a legal string; if not, you need to throw an assert (if you're in a debug build) to let the programmer know that a problem exists in the calling function. You've just eliminated one gotcha. Next, check to see whether the string is too long for your buffer, and complain if it is. Since strlen() also blows up when passed a null pointer, you'll want to check for that condition before checking the string length. Note the use of the sizeof() operator, which helps keep you from making mistakes if you later decide to change the size of buf—this operator automatically takes into account any such changes. As a last point, you need to reduce the length of the inp string to be one character less than the size of the buffer to leave room for the null character.

So, what can go wrong? The most likely problem you'll encounter is that inp really isn’t null-terminated, and as a result, the strlen() call will blow up. Another problem is that the inp pointer might not be valid—checking for NULL is nice, but the pointer might still be illegal. For example, the pointer might point into kernel space, point too low into user space (<64KB), or be complete junk. However, if you do too much checking, your code will run slowly, so you have to make some compromises. However, as Steve Maguire points out in his book Writing Solid Code (ISBN: 1556155514), if you’re running around dereferencing null pointers, execution speed is the least of your worries.

So, what does this code do right? If you get any obvious errors or enter a string that's too long, you’ll fail gracefully, note the error, and return execution to the caller. A more complete example would return unique errors to the caller, but I've simplified the code in this example.

Is strncpy() Better?

The second library call, strncpy(), is defined as

char *strncpy( char *dest, const char *src, size_t count );

On the surface, this one looks better than strcpy()—at least it wants to know how many characters you’d like to stuff into the buffer. However, when you take a closer look, you see that it still has problems. For example, strncpy() still doesn't address the problem of dest or src being null, and if you lie to it about the character count, things can get ugly fast. Let’s look at some code to illustrate its usage:

void baz(const char* inp)
{
    char buf[25];

    //always check the validity of your inputs
    if(inp == NULL)
    {
        assert(false);
        printf("Yuck! You're passing a null pointer!/n");
        return;
    }

    strncpy(buf, inp, sizeof(buf)-1);
    //you always have to remember to null terminate
    buf[sizeof(buf)-1] = ‘/0’;
    //do more processing
    return;
}

On the face of things, this function looks better. You don’t have to determine whether inp is too long, and you won’t overwrite the buffer—you'll just write one less byte than the buffer can hold. It also has the advantage of dealing properly with the case where inp isn’t null-terminated. Some people will argue that you should always use this function and never use strcpy(), but strncpy() has a few catches.

First, you have an additional step of ensuring that your buffer is null-terminated. Many programmers don’t read the fine print and forget this important step. If inp is too long, the function won't write the string's terminating null character in the buffer. If the function does write the terminating null character, you've just wasted an instruction.

Second, you need to consider the return of this function—all it gives you is a pointer to the destination string, and it doesn't reserve a value for an error. Imagine you've decided that if inp is longer than what can fit into the buffer, inp is junk and you should return an error (which I recommend in most cases). Using strncpy(), you can't easily determine this error, although I've seen various tricks that work, such as

//do this first
buf[sizeof(buf)-1] = '/0';
//tell strncpy that it can write into the whole buffer
strncpy(buf, inp, sizeof(buf));

//if the string was too long, this will be overwritten
if(buf[sizeof(buf)-1] != '/0')
{
    printf("Inp string too long!/n");
    return;
}

With this modification, you can armor the string handler against everything except some fairly unusual pointer errors. If you think this function seems like a lot of work to get a few characters safely into a buffer, you’re absolutely right.

_snprintf() to the Rescue

The third library call, _snprintf(), makes a lot of the code we've been examining easier to write and less error-prone. _snprintf() is defined as

int _snprintf( char *buffer, size_t count, const char *format [, argument] ... );

_snprintf() is also more versatile than the other two library calls, and you can do a lot of otherwise tricky string handling here. For example,

void foobar(const char* inp)
{
    char buf[25];

    //check for illegal inputs
    if(inp == NULL)
    {
        assert(false);
        printf("Yuck! You're passing a null pointer!/n");
        return;
    }

    if(_snprintf(buf, sizeof(buf)-1, "%s", inp) < 0)
    {
        printf("Input string too long!/n");
        return;
    }
    else
    {
        //always null terminate
        buf[sizeof(buf)-1] = '/0';
    }
    //do more processing
    return;
}

Note that you still have to determine whether inp is a valid pointer, and you always have to remember to handle the case where inp is exactly the size that you can place into buf and use sizeof(buf)-1, not the entire size of buf. I find this code a lot easier to read and understand, a fact that other programmers who have to work on your code will appreciate.

However, none of this is free. _snprintf() is more versatile (e.g., you can use it to convert Unicode to and from single-byte), but it comes with more overhead. For example, if performance is extremely critical, such as in an embedded system, you might not want to use _snprintf(). Another problem with this library call is that it isn’t ANSI standard; as a result, the implementation varies between Windows-based and UNIX-based platforms. If portability is a concern, this problem can be sticky because not all UNIX (or Linux) systems offer this function, and those that do implement it in different ways. Some implementations return the number of bytes that you need in your buffer if an error occurs, and some implementations always null-terminate. If portability is a concern, verify how every OS you’ll support deals with this concern, and consider wrapping it. When you wrap a function, you create a function that behaves the same to the outside world, but hides the differences between OSs. For example,

int My_snprintf( char *buffer, size_t count, const char *format [, argument] ... )
{
#ifdef WIN32
Do things the Windows way
#else
Do things the UNIX way
#endif
}

When you compile this code under NT or UNIX, it works as it should—the rest of the application doesn’t have to include the #ifdef stuff everywhere we need to do the same thing. In this case, we’d create a My_sprintf(), which is actually quite difficult because of the variable number of arguments to _snprintf.

STL and String

As it turns out, C++ and the STL are a great help because under the new ANSI C++ specification, string is a standard data type and many common jobs related to strings have well-implemented methods that help. Let’s look at the code:

void barbaz(const char* inp)
{
    string str;
   
    //check for illegal inputs
    if(inp == NULL)
    {
        assert(false);
        printf("Yuck! You're passing a null pointer!/n");
        return;
    }

    //this is easy!
    str = inp;
    //now check to see if it was too long, or had nothing in it
    if(str.length() > 25 || str.empty())
    {
        printf("Input string invalid/n");
        return;
    }
    //do more processing
}

Although this code doesn’t address the case where inp isn’t terminated, the following line will:

str.copy(inp, 26);

How your code handles long strings depends on whether you want to enforce a 25-character limit or just selected this limit because it was convenient.

I’ve shown you some of the perils of string handling, and the compromises you encounter using three C calls and a portion of the STL. Improper string handling frequently results in security problems, and I hope this information will help you avoid letting your code become part of an attack on someone’s computer.

Avoiding Buffer Overruns

Avoiding Buffer Overruns A buffer overrun is one of the most common sources of security risk....
  • chao56789
  • chao56789
  • 2016年06月30日 13:35
  • 533

Testing for Buffer Overruns

Secure Windows Programming
  • solonetworks
  • solonetworks
  • 2010年06月16日 12:07
  • 621

java中的类型安全问题-Type safety: Unchecked cast from Object to ...

Type safety: Unchecked cast from Object to ...
  • u014783027
  • u014783027
  • 2015年11月24日 22:17
  • 7819

[Python]json对象转换出错expected string or buffer python

【问题】 今天在使用python中的json转换碰到一个问题: 【代码】 comments.json { "count":"2", "page":"1", "comments...
  • SunnyYoona
  • SunnyYoona
  • 2015年01月12日 19:00
  • 12724

Python出现: TypeError: expected string or buffer

python 提示 TypeError: expected string or buffer,一般为数据类型与预定的数据类型不一致造成的...
  • xm_csdn
  • xm_csdn
  • 2017年10月19日 15:27
  • 2698

Node.js开发入门—Buffer用法详解

Node.js中有一个Buffer类,必须要介绍一下,因为我们在使用Node.js做服务端开发时,http、tcp、udp、文件io等等类型的操作,都会用到Buffer,离开它基本没办法玩儿下去...
  • foruok
  • foruok
  • 2015年10月08日 07:02
  • 15699

django 接受post请求json.dumps()的时候会引发TypeError: 'expected string or buffer'错误

在客户端中json.dumps() 一个{'a': 1}的字典,post请求发送到django中。 在django的request.POST得到的是django.http.request.QueryD...
  • zyy247796143
  • zyy247796143
  • 2017年04月30日 14:54
  • 1232

TypeError: expected string or buffer的解决方法

今天在写爬虫脚本的时候,碰见了TypeError: expected string or buffer的错误,整理下来,以防忘记。 这个主要是要访问的类型错误,比如下面的代码 f = open...
  • You_are_my_dream
  • You_are_my_dream
  • 2016年11月22日 13:51
  • 4039

Python TypeError: must be string or buffer, not dict

user_agent = 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)' headers={'User-Agent':user_agent} #s...
  • lsg_lsg_lsg
  • lsg_lsg_lsg
  • 2017年10月25日 17:33
  • 351

Django报错expected string or buffer

最近写django项目,进行模型改动后migrate出现这个报错。从stack overflow上面看见的解决方法。 解决方法,删掉项目目录下的migrations文件夹然后运行命令python m...
  • XpxiaoKr
  • XpxiaoKr
  • 2016年04月25日 23:11
  • 852
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:Avoiding Buffer Overruns with String Safety
举报原因:
原因补充:

(最多只允许输入30个字)