ANSI and UnicodeCharacter and String Data Types

ANSI and UnicodeCharacter and String Data Types

I'm sure you're aware that the Clanguage uses the char data type to represent an 8-bit ANSI character. Bydefault, when you declare a literal string in your source code, the C compilerturns the string's characters into an array of 8-bit char data types:

   // An 8-bit character
char c = 'A';

// An array of 99 8-bit characters and an 8-bit terminating zero.
char szBuffer[100] = "A String";

Microsoft's C/C++ compiler defines abuilt-in data type, wchar_t, which represents a 16-bit Unicode (UTF-16)character. Because earlier versions of Microsoft's compiler did not offer thisbuilt-in data type, the compiler defines this data type only when the/Zc:wchar_t compiler switch is specified. By default, when you create a C++project in Microsoft Visual Studio, this compiler switch is specified. Werecommend that you always specify this compiler switch, as it is better to workwith Unicode characters by way of the built-in primitive type understoodintrinsically by the compiler.

 Note  Prior to the built-incompiler support, a C header file defined a wchar_t data type as follows:

typedef unsigned short wchar_t;

Here is how you declare a Unicodecharacter and string:

// A 16-bit character
wchar_t c = L'A';

// An array up to 99 16-bit characters and a 16-bit terminating zero.
wchar_t szBuffer[100] = L"A String";

An uppercase L before a literal stringinforms the compiler that the string should be compiled as a Unicode string.When the compiler places the string in the program's data section, it encodeseach character using UTF16, interspersing zero bytes between every ASCIIcharacter in this simple case.

The Windows team at Microsoft wants todefine its own data types to isolate itself a little bit from the C language.And so, the Windows header file, WinNT.h, defines the following data types:

    typedef char     CHAR;    // An 8-bit character

typedef wchar_t WCHAR;    // A 16-bit character

Furthermore, the WinNT.h header file defines a bunch of convenience data types for working with pointers to characters and pointers to strings:

// Pointer to 8-bit character(s)
typedef CHAR *PCHAR;
typedef CHAR *PSTR;
typedef CONST CHAR *PCSTR

// Pointer to 16-bit character(s)
typedef WCHAR *PWCHAR;
typedef WCHAR *PWSTR;
typedef CONST WCHAR *PCWSTR;

 Note  If you take a look atWinNT.h, you'll find the following definition:

typedef __nullterminated WCHAR *NWPSTR, *LPWSTR, *PWSTR;

The __nullterminated prefix is a headerannotation that describes how types are expected to be used as functionparameters and return values. In the Enterprise version of Visual Studio, youcan set the Code Analysis option in the project properties. This adds the/analyze switch to the command line of the compiler that detects when your codecalls functions in a way that breaks the semantic defined by the annotations.Notice that only Enterprise versions of the compiler support this /analyzeswitch. To keep the code more readable in this book, the header annotations areremoved. You should read the "Header Annotations" documentation onMSDN at http://msdn2.microsoft.com/En-US/library/aa383701.aspx for more details about the headerannotations language.

In your own source code, it doesn'tmatter which data type you use, but I'd recommend you try to be consistent toimprove maintainability in your code. Personally, as a Windows programmer, Ialways use the Windows data types because the data types match up with the MSDNdocumentation, making things easier for everyone reading the code.

It is possible to write your source codeso that it can be compiled using ANSI or Unicode characters and strings. In theWinNT.h header file, the following types and macros are defined:

#ifdef UNICODE

 

typedef WCHAR TCHAR, *PTCHAR, PTSTR;

typedef CONST WCHAR *PCTSTR;

#define __TEXT(quote) quote          // r_winnt

 

#define __TEXT(quote) L##quote

 

#else

 

typedef CHAR TCHAR, *PTCHAR, PTSTR;

typedef CONST CHAR *PCTSTR;

#define __TEXT(quote) quote

 

#endif

 

#define   TEXT(quote) __TEXT(quote)

These types and macros (plus a few lesscommonly used ones that I do not show here) are used to create source code thatcan be compiled using either ANSI or Unicode chacters and strings, for example:

 // If UNICODE defined, a 16-bit character; else an 8-bit character
TCHAR c = TEXT('A');

// If UNICODE defined, an array of 16-bit characters; else 8-bit characters
TCHAR szBuffer[100] = TEXT("A String");

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值