hen Visual Studio® 2005 ships, you'll notice that it includes a major upgrade to the Visual C++® Libraries. This upgrade resulted from a complete security review of the functions contained in the C Runtime (CRT) Library, Standard C++ Library (SCL), Active Template Library (ATL) and Microsoft® Foundation Classes (MFC). This extensive review mandated substantial changes that can improve the security and robustness of your apps.
Changes include deprecation of functions that are known to be risky, such as strcpy. New, safer functions have been added to supersede them. Many functions have gained extra error checking and validation. Extra debugging support has been added where it can help the most.
This article describes the Safe C and C++ Libraries available in Visual C++ 2005. I'll cover the architecture and design principles that guided the development of these libraries, and I have included specific examples of various ways to use them to write more secure code. I will also briefly discuss how other security-related libraries work. Finally, I'll present a migration guide to help you manage the transition of your code from earlier versions of Visual C++ to Visual C++ 2005.
The code samples here have been tested with a prerelease build of Visual Studio 2005. I encourage you to get a copy of the latest Beta and follow along with your own applications. That's probably the best way for you to get a handle on the important changes.
Guiding Principles to Safe Libraries
When the Visual C++ team was designing the Safe Libraries, we used several principles to guide the design development. Familiarity with these principles will help you understand the choices we made, but remember that there is no library, development process, document, or review that ensures completely secure code. The use of these libraries can help make your existing code more secure with relatively small changes, but nothing can stop you from doing something risky in your own code, or block you completely from misusing our code. Ultimately, even after you enable all of the new Safe Libraries, you still need to follow the other best practices at the heart of a secure development process—such as threat modeling, code review, and penetration testing. These topics are well covered in Writing Secure Code by David LeBlanc and Michael Howard (Microsoft Press®, 2002).
"Secure by default" is one of the Trustworthy Computing principles that helped guide this effort. We know that deprecating strcpy and generating a warning at each usage will cause significant migration cost. But we wanted to make sure that out of the box, you'd automatically get the safest behavior, even for upgraded projects. Of course, you can always disable the warnings until you are ready to deal with them.
In designing the libraries, we tried to reduce the number of paradigm shifts. There are as many opinions on the best and safest way to code as there are software developers. I've found that the safest way to write string manipulation code is to use a C++ string class like std::string or MFC/ATL's CString. The Visual C++ team knows that there is a lot of C-style code out there that is in a maintenance or incremental evolution phase and which might not bear the cost and risk of a rewrite to use C++ and a string class. As a result, we've provided Safe Libraries that work the way your code works today. This allows you to apply these libraries even to your older, less malleable code.
Some of the security work we did in the libraries for the Visual C++ 7.1 releases included full reviews and a large number of changes and fixes. A main objective, however, was to not break source compatibility, since we were only working on a point release. In order to move our safety work to the next level, the team had to take a different approach for the new release. While we still try to avoid breaking compatibility, when we found a problem that absolutely required a source compatibility break, we did that. For example, there is no way for strcpy to be safe in C code without taking an extra parameter to indicate the size of the destination buffer. The library code also marks every function that has been deprecated in this way in the documentation for easy reference. We couldn't provide you with the safest libraries without making some changes to your code, so with the new libraries we're doing that. And because the threats today are very different from those that existed way back when the C Standard Library was designed, some changes of emphasis are necessary.
Despite these reasons for changing your code, we recognize that there are many costs. Each time an upgrade of Visual Studio forces you to make modifications, you must pay, face the risk of introducing new bugs, and perform testing to ensure continued quality. In most cases, we've been able to make improvements to the safety of the libraries without requiring you to make any change at all, and where changes are required, we've tried to make them easy to perform by designing the new functions to closely match the old ones and to use consistent patterns.
We considered it our responsibility to the development community to extend beyond our own product, so I was delighted to get a chance to work with the C standards committee on the proposals for the Safe Libraries. The committee has provided lots of useful suggestions and feedback to help us evolve the functions. We're hoping that we'll soon be able to issue the technical report on this subject from the C committee. Keep a look out for the current draft of this technical report at www.open-std.org/jtc1/sc22/wg14.
Safe Libraries Design
My team owns the Visual C++ Libraries, which includes some of the newest code in the developer division (such as ATL Server) as well as some of the oldest code in the product (such as the CRT). When we looked at the code, we saw some large differences in coding standards as common practice has improved in the last 20 years. One thing that stood out was that some of the older code was written at a time when every extra byte of code was precious, and thus lacked full validation of all parameters.
The newer code is littered with assertions and checks, and we've found that these really help debugging. Increasingly, we've focused on making sure that the retail code is just as robust as our debug code in the face of unexpected conditions. These checks aren't just for debugging any more—they also make the library code safer. A key part of the Safe Libraries initiative has been to add validation code to most of the library functions that has impact on both debug and retail code. For example, if you pass the libraries invalid flags, they will now assert and tell you about your bug.
A lesson we've learned from looking at lots of security attacks is that attackers often exploit the failure to perform appropriate error checking, or an application's "tolerance" for unexpected situations, such as an error return from a standard function. We took this into consideration when we designed the secure code generation (/GS) feature for Visual C++ 7.0. When code compiled with /GS detects a stack-based buffer overrun, the library code terminates it immediately with a minimum of extra code running in the process, thereby reducing the attack surface. We've reduced this surface even further for Visual C++ 2005 (for more information, see Stephen Toub's article "C++: Write Faster Code with the Modern Language Features of Visual C++ 2005" in the May 2004 issue of MSDN®Magazine). The problem is that any code that happens after the library code detects a failure may help an attacker exploit your process, so all that code has been kept small.
The very same logic was applied to the validation code. If you pass the library code a bad parameter, it will respond in debug by asserting and helping to debug your code. If you take a quick look at the call stack at the time of the failure you will be able to track back from the assert to the place in your code that caused the problem. In the retail build, the library code calls its invalid parameter handler, whose default action is to invoke Windows Error Reporting, which can capture a crash dump and invoke a just-in-time debugger as needed.
In this context, an invalid parameter is one that displays an obvious bug that you could have known about when you wrote the code. These often result from undefined behavior in the C standard. For example, if your code writes a string to a NULL pointer using sprintf, the library code will report an invalid parameter. However, if you ask the library code to allocate 10MB of memory using malloc, and the system only has 5MB of memory free, it will just return NULL as it always has. This is just an expected runtime error, not a programming bug.
One problem with this kind of runtime error is that not all of the functions provide a way to return an error directly to the caller. Some of the C functions rely on setting errno, and some don't even set that. When we added new functions, we tried to make sure that they had an error return path directly from the function.
Though aborting is often the safest thing to do in this scenario, you do have the ability to configure the response to this situation. C code can choose an alternative handler, which can for, example, result in the code returning a C-standard error code. Code that handles errors in portable ways across multiple platforms might choose this model. C++ code might instead choose to throw an exception from this handler.
Much of a developer's work seems to revolve around strings. This is one reason why buffer overruns are such a common cause of problems in code. Moreover, one of the largest sources of problems in the Visual C++ library code is the set of traditional string functions in the C Runtime Library.
Arguments about which kind of string buffer is better or safer will rage forever. The choice between null termination and length prefix has long been a part of the low-level language wars. Since we do not want to force major change, we need to live within the boundaries of existing practice in C. The C language and its libraries are intimately tied to the null-terminated string. Their design philosophy reflects the priorities and constraints of thirty years ago when performance and code size were the primary concerns. But to prevent buffer overruns in the libraries, we're going to have to treat those null-terminated strings with a little more rigor than the current library does.
The simplest rule is also the one that creates the most work for the developer and the library designer: never write to a string buffer without knowing how much space you have to write into. The design of strcpy makes the most basic mistake here. There is no way for strcpy to safely write to its destination because it does not know how much space the caller has allocated.
Following this simple rule requires the addition of many new functions, each of which is characterized by the addition of an integer parameter after the destination buffer: strcpy(dest, src) becomes strcpy_s(dest, destsize, src).
You might wonder why the library code does not also need to pass in a size for the source parameter. This question leads to a key design point of the Safe Libraries. I've already mentioned that we had to stick with null-terminated strings. Remember that a null-terminated string that you are reading from already knows its own length. Passing in a second length value may create new truncation failure modes. Additionally, when analyzing existing code, many strings are stored in fixed size buffers that contain only the string itself. For these strings, adding an input size would not add any information. Finally, we found that passing a size around here caused large code churn in many applications so we reached an unavoidable conclusion—functions must trust their input strings.
This isn't really as surprising as it first sounds. Remember that the library code also trusts that the length you pass as the destination size matches the size you allocated for the destination string. With native C and C++ code, you can always pass the wrong thing to a library and undermine the application.
We apply the same principle in the Safe C++ Libraries in that your input is trusted, but the library code insists that you detail your output. If you pass in a pair of input iterators, the libraries trust you to have not exceeded the bounds of your container. But if you pass an output iterator, it must be a checked iterator that knows how much space it has to output into. Traditional unchecked iterators cannot be used as they could write past the end of a buffer.
There is obviously one major case where you definitely don't want the library code to assume that an incoming string is in good shape—when it comes from an untrusted source. For example, imagine you are parsing a packet of data that came from another process or from across the network. In this case, a specially crafted packet might omit a terminator in the hope that this will cause some problem in your code. We added a special function (strnlen) to find the length of an untrusted string. Unlike the other new functions, you don't use this in place of strlen. Instead, you use it in specific cases when you are dealing with untrusted data.
Strings are the largest and most complex challenge we face in our work, but the team addressed several issues at the same time. For example, in the library's file functions, all the new functions deal correctly with the Windows extra-long paths. The team changed all of them to default to an exclusive sharing mode, reducing the risk of an attack against temporary or intermediate files. We also switched all the global variables to be function-based, so that the libraries could report a problem if you ask for a global variable at a time during startup when it is not yet initialized. Additionally, we changed the code of all the lowest-level implementation functions to use only a small amount of stack space and then to fall back to using heap space. This should make them more robust in constrained environments.
The Library in Practice
Let's now take a look at the code in section A of Figure 1 and see how you might use the Safe Libraries to remediate it. This is just a very short piece of string manipulation based on the CRT, but the different possible changes serve to illustrate the kinds of choices you'll have to make as you modify your code.
Section A shows code with classic buffer overrun risks. The original developer probably thought this string would never exceed 10 characters, so surely 20 was enough. Compiling this code with Visual C++ 2005 will result in function deprecation warnings on each of the lines marked with a warning comment.
Section B shows a very basic fix to the same code. This fix simply reinforces the initial programmer's assumption. If src and "…" are really small enough to fit in dest, then everything will be fine. If not, the library will call the _invalid_parameter_handler and abort the process rather than allow a buffer overrun. This sample uses _countof to determine the size of the buffer. The _countof function is a new addition to the CRT that gets an array element count and, in C++ code only, uses template magic in order to avoid being applied to pointers.
Section C shows an alternative remediation where the developer simply wants to truncate the source string. It's important to understand the difference between wcscpy_s, which assumes you have allocated enough space and aborts if it cannot fit, and wcsncpy_s, which will truncate if it cannot fit into the available space.
Section D assumes that your program has a known absolute maximum size for whatever src represents (for example, a path that cannot exceed MAX_PATH). In this case, you can still use a static buffer on the stack and can return to wcscpy_s. As with Section B, it will abort if the string doesn't fit, but presumably the developer is very sure that nothing is ever longer than MAX_SRC_SIZE.
Section E shows a version of the function that can actually cope with an arbitrarily long string, at the expense of some heap space. This one also aborts in wcscpy_s if there isn't enough space, but that should only occur if you have a logic bug. Note that the code uses calloc, rather than malloc, as doing so avoids the possibility of integer overflow in the multiplication to calculate the size bytes.
Section F shows what you could do if you have the time to really clean up your code. The design assumes that some of you don't; however, in some cases this kind of change can make sense.
The Safe C Library
When the team examined the existing CRT, we wanted to make some minor changes to its functions. Most of the changes are simple implementation upgrades, such as better parameter validation, improved handling of long filenames, and bounded stack usage. They generally won't require you to change your code, although sometimes you'll find that the increased validation illuminates problems already in your code.
There is, however, a much smaller set of functions where we had to add a parameter or change the behavior significantly (see Figure 2). Even though the list has over a hundred items, you'll notice that many are essentially duplicates. When we deprecate strcpy, we have to make corresponding changes to wcscpy and _mbscpy.
Each deprecated function is replaced by a safer function with an _s appended to the name. Functions with prefix are Microsoft or POSIX extensions to the C standard. Functions without an underscore are part of the C standard. We've worked with the C standards committee to develop a set of functions that supersede them.
Each of these new functions is marked deprecated. Deprecation is a relatively new Visual C++ compiler mechanism that allows the library code to warn you when you use recommended library features. The function's declaration is marked with __declspec(deprecated), and you see a warning each time you use the function.
We've used macros to group together the deprecations. All the insecure functions are marked _CRT_INSECURE_DEPRECATE in the headers. To prevent the deprecation of these functions, you can set a special macro at compile time—_CRT_SECURE_NO_DEPRECATE—and security deprecation warnings will stop.
These new functions were added to address specific problems when we couldn't fix the problem in the existing function. For most of them, the problem was the lack of a specified buffer size. Almost all string functions, and many other functions involving buffers in the C library, have this buffer size problem. The libraries now always accept the buffer size as the parameter directly after the output buffer.
Lack of a specified buffer size afflicts scanf particularly badly. For scanf_s, the parameters to the format string must include a buffer size after each buffer parameter to the scanf function.
Another common problem is functions that do not terminate their strings correctly. In the Standard C Library, both strncpy and snprintf suffer from this problem. All the new functions always terminate their output and require space for the string terminator within their buffer.
When using the library, we found we needed two models of string function. In most cases, if a string overflows a buffer, it means there is a programming bug, and the library aborts; strcpy_s has this behavior. However, in a smaller set of cases, truncation is the correct behavior. We added a new mode to strncpy_s to support this. When _TRUNCATE is passed as the last parameter, users will get as much of the source string as can fit in the destination while retaining a terminator, and STRUNCATE is returned if truncation occurred.
In a few cases, a function did not need extra parameters but had a critical behavior change. For example, we changed fopen_s to default to opening files in exclusive mode instead of in shared mode. This is a much safer default, but we changed the function name to make sure you explicitly opt into the new behavior.
Figure 3 shows a smaller set of functions where we have added a new function even though its counterpart is not deprecated. In each of these cases, we added a function because there was a way to get extra safety in some situations, but we did not expect them to supersede other functions. For example, qsort_s adds a context variable which can be very useful if you are sorting data and wish to pass extra context information that is not stored within the objects being sorted themselves. In the past, developers often used static variables for this task, which creates the risk of threading and reentrancy issues. Similarly, memcpy_s adds an extra size parameter for the size of the destination buffer. This will often be the same as the size of the source buffer, but some developers find it easier to read if both sizes are provided.
One major advantage of the new functions is that whenever they write to a buffer, they always know its size. As a result, we've added a new debugging feature to the CRT. Whenever you call a function (such as strcpy_s) in a _DEBUG build, the library code will always fill up the output buffer completely. This helps ensure that if you get the buffer size wrong, you'll see that error much more quickly. It also helps detect other subtle bugs such as the use of variables that are already destructed or out of scope.
Though these changes are in the C library, we added some code specific to C++ to help reduce even further the cost of making your code safer. Figure 4 shows how this can work for you.
Section A shows another simple piece of C-style code. Our Safe Libraries will warn on all three lines because buffers are used without sizes. You will notice that the temporary buffer isn't really necessary for this function, but it helps me make my point.
Section B shows the simple remediation discussed already. It uses _countof with wcscpy_s to specify the output buffer size.
Section C shows an even simpler remediation, but this one only works if you are compiling your code for C++. You'll see that this code looks a lot more like the original code. Instead of adding parameters, we just changed the names of the first two calls. We use templates to deduce the buffer size in the first two calls. Because temp is a local fixed-size buffer, the templates can automatically deduce the buffer size and pass it in to wcscpy_s.
Section D returns us to the original code, but we use #define _CRT_SECURE_CPP_OVERLOAD_STANDARD_NAMES as well. This setting is not enabled by default because it can cause problems with some unusual coding patterns. But setting it can dramatically reduce the work required. You can see that with this #define set you don't even get a warning on the first two calls. We've used a template to convert a call to wcscpy into a call to wcscpy_s whenever a fixed size destination is used. Of course, if the buffer was allocated on the heap or outside of the function, you'll still have to tell us the size, as happens on the last line.
Section E shows the minimal work you would have to do to fix the function if you use _CRT_SECURE_CPP_OVERLOAD_STANDARD_NAMES. You only have to change the case where a dynamic output buffer is used. All the other code compiles without change. We used this option when applying the Safe Libraries to the whole developer division codebase late last year, and it really reduced the work involved.
Safe C++ Library
The new Standard C++ Library is safer and requires a lot less work because it was designed recently and because the encapsulation inherent in C++ classes often provides the library developer with increased flexibility to make changes. The primary problem with the Standard C++ Library is that its designers did a great job of making it as efficient as traditional pointer-based iteration in C. This means that a simple iterator often knows nothing more about its context than does a pointer in the middle of an array.
Unfortunately, this means that an iterator also can't tell if it is participating in a buffer overrun. Since it doesn't know where its bounds are, it can't enforce them.
There is a very explicit tradeoff here between efficiency and bounds checking. In previous versions, the library code defaults to the efficient iteration, but in the new version we have added a safer mode where iterators can be sure that they are being used correctly. Figure 5 shows an example of this.
Section A shows a simple function that outputs a well-known string. This code works fine, but you will get a warning at compile time telling you that _Copy_opt is deprecated. This is a warning that you used an unsized buffer as a destination.
Section B shows a fixed version of this code, where we've used the new checked iterators to wrap up the pointer with its size. We've been lucky to have a strong implementation of the Standard C++ Library with help from Dinkumware. While we were busy adding safer iterators to the libraries, and switching Dinkumware's code to use the Safe C Runtime functions, Dinkumware was also busy adding some new debug iterators functionality. This is only enabled in _DEBUG, but it can find a plethora of misuses, such as invalidated iterators or using an iterator from the wrong collection.
Figure 6 shows a program that will demonstrate both of these runtime checks, depending on whether it is compiled debug or release. When an error occurs at run time in the release build code, you have two rational choices in C++. You can abort, as you do in C, or throw an exception, as is normal in C++ code. Both options are available, and if you set _SECURE_SCL_THROWS to 1, you'll get an exception rather than the aborting behavior when you go past the end of your container.
Other Safety Improvements in Visual Studio 2005
There are a few other safety improvements in the Safe Libraries that are worth mentioning here. I don't have space to go into detail on each one, but this should give you an indication of the work going on and serve as a starting point for you to learn more.
We've done lots of work in MFC and ATL, but because of their higher-level nature, they didn't require nearly as much change to have all their string buffers treated correctly. We did work to switch them over to use the Safe C Library functions instead of the older ones. In a few cases, an MFC or ATL function had an output buffer with no size, and as a result we deprecated the function and added a new overload with the extra parameter.
We've also made a significant change to the deployment and servicing model in this version. It is important that software developers ship updated copies of the libraries to customers with their applications. When we ship a Visual Studio service pack, we don't automatically update every customer machine with a new copy of the DLLs. However, in Visual C++ 2005 we have switched the libraries to install and run using the Windows side-by-side execution technology. Each executable built with Visual C++ needs a manifest so that it can find its copy of MSVCR80 and other library DLLs. Side-by-side allows DLLs to be installed centrally (in %systemroot%/WinSxS) or in an app-local directory.
Use of this technology has several benefits, including elimination of DLL Hell problems. However, a key benefit for safety is that, in an emergency, we directly service the libraries on your machine via Windows Update. This central servicing affects all applications, including those that install the libraries app-local. Therefore, this would only be used as a last resort. We work very hard to ensure that the library code does not contain any problematic code. But in the case of a serious security problem in a redistributable component, we have a way to help customers directly.
Working on the Visual C++ team gives me the opportunity to influence code for the better. With the Safe Libraries, you and your teams can build much safer apps with minimal code change. They give you an opportunity to expend little effort to realize a significant increase in the safety and robustness of your application. See the sidebar "Migrating Your Code" for tips on migrating your code base to Visual C++ 2005.