ANSI and UnicodeCharacter and String Data Types

最新推荐文章于 2023-03-25 15:18:20 发布

anbaixiu

最新推荐文章于 2023-03-25 15:18:20 发布

阅读量761

点赞数

分类专栏： VS2008

VS2008 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

ANSI and UnicodeCharacter and String Data Types

I'm sure you're aware that the Clanguage uses the char data type to represent an 8-bit ANSI character. Bydefault, when you declare a literal string in your source code, the C compilerturns the string's characters into an array of 8-bit char data types:

// An 8-bit character
char c = 'A';

// An array of 99 8-bit characters and an 8-bit terminating zero.
char szBuffer[100] = "A String";

Microsoft's C/C++ compiler defines abuilt-in data type, wchar_t, which represents a 16-bit Unicode (UTF-16)character. Because earlier versions of Microsoft's compiler did not offer thisbuilt-in data type, the compiler defines this data type only when the/Zc:wchar_t compiler switch is specified. By default, when you create a C++project in Microsoft Visual Studio, this compiler switch is specified. Werecommend that you always specify this compiler switch, as it is better to workwith Unicode characters by way of the built-in primitive type understoodintrinsically by the compiler.

Note Prior to the built-incompiler support, a C header file defined a wchar_t data type as follows:

typedef unsigned short wchar_t;

Here is how you declare a Unicodecharacter and string:

// A 16-bit character
wchar_t c = L'A';

// An array up to 99 16-bit characters and a 16-bit terminating zero.
wchar_t szBuffer[100] = L"A String";

An uppercase L before a literal stringinforms the compiler that the string should be compiled as a Unicode string.When the compiler places the string in the program's data section, it encodeseach character using UTF16, interspersing zero bytes between every ASCIIcharacter in this simple case.

The Windows team at Microsoft wants todefine its own data types to isolate itself a little bit from the C language.And so, the Windows header file, WinNT.h, defines the following data types:

typedef char CHAR; // An 8-bit character

typedef wchar_t WCHAR; // A 16-bit character

Furthermore, the WinNT.h header file defines a bunch of convenience data types for working with pointers to characters and pointers to strings:

// Pointer to 8-bit character(s)
typedef CHAR *PCHAR;
typedef CHAR *PSTR;
typedef CONST CHAR *PCSTR

// Pointer to 16-bit character(s)
typedef WCHAR *PWCHAR;
typedef WCHAR *PWSTR;
typedef CONST WCHAR *PCWSTR;

Note If you take a look atWinNT.h, you'll find the following definition:

typedef __nullterminated WCHAR *NWPSTR, *LPWSTR, *PWSTR;

The __nullterminated prefix is a headerannotation that describes how types are expected to be used as functionparameters and return values. In the Enterprise version of Visual Studio, youcan set the Code Analysis option in the project properties. This adds the/analyze switch to the command line of the compiler that detects when your codecalls functions in a way that breaks the semantic defined by the annotations.Notice that only Enterprise versions of the compiler support this /analyzeswitch. To keep the code more readable in this book, the header annotations areremoved. You should read the "Header Annotations" documentation onMSDN at http://msdn2.microsoft.com/En-US/library/aa383701.aspx for more details about the headerannotations language.

In your own source code, it doesn'tmatter which data type you use, but I'd recommend you try to be consistent toimprove maintainability in your code. Personally, as a Windows programmer, Ialways use the Windows data types because the data types match up with the MSDNdocumentation, making things easier for everyone reading the code.

It is possible to write your source codeso that it can be compiled using ANSI or Unicode characters and strings. In theWinNT.h header file, the following types and macros are defined:

#ifdef UNICODE

typedef WCHAR TCHAR, *PTCHAR, PTSTR;

typedef CONST WCHAR *PCTSTR;

#define __TEXT(quote) quote // r_winnt

#define __TEXT(quote) L##quote

#else

typedef CHAR TCHAR, *PTCHAR, PTSTR;

typedef CONST CHAR *PCTSTR;

#define __TEXT(quote) quote

#endif

#define TEXT(quote) __TEXT(quote)

These types and macros (plus a few lesscommonly used ones that I do not show here) are used to create source code thatcan be compiled using either ANSI or Unicode chacters and strings, for example:

// If UNICODE defined, a 16-bit character; else an 8-bit character
TCHAR c = TEXT('A');

// If UNICODE defined, an array of 16-bit characters; else 8-bit characters
TCHAR szBuffer[100] = TEXT("A String");