ANSI and UnicodeCharacter and String Data Types
I'm sure you're aware that the Clanguage uses the char data type to represent an 8-bit ANSI character. Bydefault, when you declare a literal string in your source code, the C compilerturns the string's characters into an array of 8-bit char data types:
// An 8-bit character // An array of 99 8-bit characters and an 8-bit terminating zero. |
Microsoft's C/C++ compiler defines abuilt-in data type, wchar_t, which represents a 16-bit Unicode (UTF-16)character. Because earlier versions of Microsoft's compiler did not offer thisbuilt-in data type, the compiler defines this data type only when the/Zc:wchar_t compiler switch is specified. By default, when you create a C++project in Microsoft Visual Studio, this compiler switch is specified. Werecommend that you always specify this compiler switch, as it is better to workwith Unicode characters by way of the built-in primitive type understoodintrinsically by the compiler.
Note Prior to the built-incompiler support, a C header file defined a wchar_t data type as follows:
typedef unsigned short wchar_t; |
Here is how you declare a Unicodecharacter and string:
// A 16-bit character // An array up to 99 16-bit characters and a 16-bit terminating zero. |
An uppercase L before a literal stringinforms the compiler that the string should be compiled as a Unicode string.When the compiler places the string in the program's data section, it encodeseach character using UTF16, interspersing zero bytes between every ASCIIcharacter in this simple case.
The Windows team at Microsoft wants todefine its own data types to isolate itself a little bit from the C language.And so, the Windows header file, WinNT.h, defines the following data types:
typedef char CHAR; // An 8-bit character typedef wchar_t WCHAR; // A 16-bit character Furthermore, the WinNT.h header file defines a bunch of convenience data types for working with pointers to characters and pointers to strings: // Pointer to 8-bit character(s) // Pointer to 16-bit character(s) |
Note If you take a look atWinNT.h, you'll find the following definition:
typedef __nullterminated WCHAR *NWPSTR, *LPWSTR, *PWSTR; |
The __nullterminated prefix is a headerannotation that describes how types are expected to be used as functionparameters and return values. In the Enterprise version of Visual Studio, youcan set the Code Analysis option in the project properties. This adds the/analyze switch to the command line of the compiler that detects when your codecalls functions in a way that breaks the semantic defined by the annotations.Notice that only Enterprise versions of the compiler support this /analyzeswitch. To keep the code more readable in this book, the header annotations areremoved. You should read the "Header Annotations" documentation onMSDN at http://msdn2.microsoft.com/En-US/library/aa383701.aspx for more details about the headerannotations language.
In your own source code, it doesn'tmatter which data type you use, but I'd recommend you try to be consistent toimprove maintainability in your code. Personally, as a Windows programmer, Ialways use the Windows data types because the data types match up with the MSDNdocumentation, making things easier for everyone reading the code.
It is possible to write your source codeso that it can be compiled using ANSI or Unicode characters and strings. In theWinNT.h header file, the following types and macros are defined:
#ifdef UNICODE
typedef WCHAR TCHAR, *PTCHAR, PTSTR; typedef CONST WCHAR *PCTSTR; #define __TEXT(quote) quote // r_winnt
#define __TEXT(quote) L##quote
#else
typedef CHAR TCHAR, *PTCHAR, PTSTR; typedef CONST CHAR *PCTSTR; #define __TEXT(quote) quote
#endif
#define TEXT(quote) __TEXT(quote) |
These types and macros (plus a few lesscommonly used ones that I do not show here) are used to create source code thatcan be compiled using either ANSI or Unicode chacters and strings, for example:
// If UNICODE defined, a 16-bit character; else an 8-bit character // If UNICODE defined, an array of 16-bit characters; else 8-bit characters |