Wide-Character Library Functions
Now let'stry defining a pointer to a string of wide characters:
wchar_t * pw= L"Hello!" ;
And now wecall strlenagain:
iLength =strlen (pw) ;
Now thetroubles begin. First, the C compiler gives you a warning message, probablysomething along the lines of
`function' :incompatible types - from `unsigned short *' to `const char *'
It's tellingyou that the strlenfunction is declared as accepting a pointer to a char, andit's getting a pointer to an unsigned short. You can still compile and run theprogram, but you'll find that iLengthis set to 1. What happened?
The 6characters of the character string "Hello!" have the 16-bit values:
0x00480x0065 0x006C 0x006C 0x006F 0x0021
which arestored in memory by Intel processors like so:
48 00 65 006C 00 6C 00 6F 00 21 00
Thestrlenfunction, assuming that it's attempting to find the length of a string ofcharacters, counts the first byte as a character but then assumes that thesecond byte is a zero byte denoting the end of the string.
Thewide-character version of the strlenfunction is called wcslen("wide-character stringlength"), and it's declared
both inSTRING.H (where the declaration for strlenresides) and WCHAR.H. Thestrlenfunction is declared like this:
size_t __cdecl strlen (const char *);
and thewcslenfunction looks like this:
size_t __cdecl wcslen (const wchar_t*) ;
All yourfavorite C run-time library functions that take string arguments havewide-character versions. For example, wprintfis the wide-character version ofprintf. These functions are declared both in WCHAR.H and in the header filewhere the normal function is declared.
There are,of course, certain disadvantages to using Unicode. First and foremost is thatevery string in your program will occupy twice as much space. In addition,you'll observe that the functions in the wide-character runtime library arelarger than the usual functions. For this reason, you might want to create twoversions of your program—one with ASCII strings and the other with Unicodestrings. The best solution would be to maintain a single source code file thatyou could compile for either ASCII or Unicode.
That's a bitof a problem, though, because the run-time library functions have differentnames, you're defining characters differently, and then there's that nuisanceof preceding the string literals with an L.
One answeris to use the TCHAR.H header file included with Microsoft Visual C++. Thisheader file is not part of the ANSI C standard, so every function and macrodefinition defined therein is preceded by an underscore. TCHAR.H provides a set of alternative names for thenormal run-time library functions requiring string parameters (for example,_tprintfand _tcslen).These are sometimes referred to as "generic"function names because they can refer to either the Unicode or non-Unicodeversions of the functions.
If anidentifier named _UNICODE is defined and theTCHAR.H header file is included in your program, _tcslenis defined to bewcslen:
#define _tcslen wcslen
If UNICODEisn't defined, _tcslenis defined to be strlen:
#define _tcslen strlen
And so on. TCHAR.H also solves the problem of the two character datatypes with a new data type named TCHAR.
If the_UNICODE identifier is defined, TCHAR is wchar_t:
typedef wchar_t TCHAR ;
Otherwise,TCHAR is simply a char:
typedef char TCHAR ;
Now it'stime to address that sticky L problem with the string literals. If the _UNICODEidentifier is defined, a
macro called__T is defined like this:
#define__T(x) L##x
This isfairly obscure syntax, but it's in the ANSI C standard for the C preprocessor.That pair of number signs is called a "token paste," and it causesthe letter L to be appended to the macro parameter. Thus, if the macro parameteris "Hello!", then L##xis L"Hello!".
If the_UNICODE identifier is not defined, the __T macro is simply defined in thefollowing way:
#define __T(x) x
Regardless,two other macros are defined to be the same as __T:
#define _T(x) __T(x)
#define _TEXT(x) __T(x)
Which oneyou use for your Win32 console programs depends on how concise or verbose you'dlike to be.
Basically,you must define your string literals inside the _T or _TEXT macro in thefollowing way:
_TEXT ("Hello!")
Doing socauses the string to be interpreted as composed of wide characters if the_UNICODE identifier is defined and as 8-bit characters if not.
As you saw in thefirst chapter, a Windows program includes theheader file WINDOWS.H. This file includes a number of other header files,including WINDEF.H, which has many of the basic type definitions used inWindows
andwhich itself includes WINNT.H. WINNT.H handles the basic Unicode support.
WINNT.H begins byincluding the C header file CTYPE.H, which is one of many C header files thathave a definition of wchar_t. WINNT.H defines new data types named CHAR andWCHAR:
typedef char CHAR ;
typedef wchar_t WCHAR ; // wc
CHAR and WCHAR arethe data types recommended for your use in a Windows program when you need todefine an 8-bit character or a 16-bit character.Thatcomment following the WCHAR definition is a suggestion for Hungarian notation:a variable based on the WCHAR data type can be preceded with the letters wc toindicate a wide character.
The WINNT.H headerfile goes on to define six data types you can use as pointers to 8-bitcharacter strings and four data types you can use as pointers to const8-bitcharacter strings. I've condensed the actual header file statements a bit toshow the data types here:
typedef CHAR * PCHAR, * LPCH, * PCH, * NPSTR, *LPSTR, * PSTR ;
typedef CONST CHAR * LPCCH, * PCCH, * LPCSTR, *PCSTR ;
The Nand L prefixes stand for "near" and "long" and refer to thetwo different sizes of pointers in 16-bit Windows. There is no differentiationbetween near and long pointers in Win32.
Similarly, WINNT.Hdefines six data types you can use as pointers to 16-bit character strings andfour data types you can use as pointers to const16-bit character strings:
typedef WCHAR * PWCHAR, * LPWCH, * PWCH, *NWPSTR, * LPWSTR, * PWSTR ;
typedef CONST WCHAR * LPCWCH, * PCWCH, *LPCWSTR, * PCWSTR ;
So far, we have thedata types CHAR (which is an 8-bit char) and WCHAR (which is a 16-bit wchar_t)and pointers to CHAR and WCHAR. As in TCHAR.H, WINNT.H defines TCHAR to be thegeneric character type. If the
identifier UNICODE(withoutthe underscore) is defined, TCHAR and pointers to TCHAR are definedbased on WCHAR and pointers to WCHAR; if the identifier UNICODE is not defined,TCHAR and pointers to TCHAR are
defined based oncharand pointers to char:
#ifdef UNICODE
typedef WCHAR TCHAR, * PTCHAR ;
typedef LPWSTR LPTCH, PTCH, PTSTR, LPTSTR ;
typedef LPCWSTR LPCTSTR ;
#else
typedef char TCHAR, * PTCHAR ;
typedef LPSTR LPTCH, PTCH, PTSTR, LPTSTR ;
typedef LPCSTR LPCTSTR ;
#endif
Regardless, the TEXTmacro is defined like this:
#define TEXT(quote) __TEXT(quote)
These definitionslet you mix ASCII and Unicode characters strings in the same program or write asingle program that can be compiled for either ASCII or Unicode.If you want to explicitly define 8-bit charactervariables and strings, use CHAR, PCHAR (or one of the others), and strings withquotation marks. For explicit 16-bit character variables and strings, useWCHAR, PWCHAR, and append an L before quotation marks. For variables and charactersstrings that will be 8 bit or 16 bit depending on the definition of the UNICODEidentifier, use TCHAR, PTCHAR, and the TEXT macro.
In the16-bit versions of Windows beginning with Windows 1.0 and ending with Windows3.1, the MessageBox function was located in the dynamic-link library USER.EXE.In the WINDOWS.H header files included in the Windows 3.1 Software DevelopmentKit, the MessageBoxfunction was defined like so:
int WINAPI MessageBox(HWND, LPCSTR, LPCSTR, UINT) ;
Notice thatthe second and third arguments to the function are pointers to constantcharacter strings.When a Win16 program wascompiled and linked, Windows left the call to MessageBox unresolved. A table inthe program's .EXE file allowed Windows to dynamically link the call from theprogram to the MessageBoxfunction located in the USER library.
The 32-bitversions of Windows (that is, all versions of Windows NT, as well as Windows 95and Windows 98) include USER.EXE for 16-bit compatibility but also have adynamic-link library named USER32.DLL that contains entry points for the 32-bitversions of the user interface functions, including the 32-bit version ofMessageBox.
But here'sthe key to Windows support of Unicode: InUSER32.DLL, there is no entry point for a 32-bit function named MessageBox.Instead, there are two entry points, one named MessageBoxA(the ASCII version)and the
othernamed MessageBoxW(the wide-character version). Every Win32 function thatrequires a character string argument has two entry points in the operatingsystem! Fortunately, you usually don't have to worry about this. You can simplyuse MessageBoxin your programs. As in the TCHAR header file, the various Windows header files performthe necessary tricks.
Here's howMessageBoxAis defined in WINUSER.H. This is quite similar to the earlierdefinition of MessageBox:
WINUSERAPI int WINAPI MessageBoxA(HWND hWnd, LPCSTR lpText, LPCSTR lpCaption, UINT uType) ;
And here'sMessageBoxW:
WINUSERAPI int WINAPI MessageBoxW(HWND hWnd, LPCWSTR lpText, LPCWSTR lpCaption, UINT uType) ;
Windows' String Functions
Here is acollection of string functions defined in Windows that calculate stringlengths, copy strings, concatenate strings, and compare strings:
ILength =lstrlen (pString) ;
pString =lstrcpy (pString1, pString2) ;
pString =lstrcpyn (pString1, pString2, iCount) ;
pString =lstrcat (pString1, pString2) ;
iComp =lstrcmp (pString1, pString2) ;
iComp =lstrcmpi (pString1, pString2) ;
These workmuch the same as their C library equivalents. They accept wide-characterstrings if the UNICODE identifier is defined and regular strings if not. Thewide-character version of the lstrlenWfunction is implemented in Windows 98.
Using printfin Windows
Thebad news is that you can't use printf in a Windows program. Although you can use most of the C run-timelibrary in Windows programs—indeed, many programmers prefer to use the C memorymanagement and file I/O functions over the Windows equivalents—Windows has noconcept of standard input and standard output. You can use fprintf in a Windowsprogram, but not printf.
The goodnews is that you can still display text by using sprintfand other functions inthe sprintffamily. These functions work just like printf, except that they write the formattedoutput to a character string buffer that you provide as the function's firstargument. You can then do what you want with this character string (such aspass it to MessageBox).
Thesprintffunction is defined like this:
int sprintf (char * szBuffer, constchar * szFormat, ...) ;
The firstargument is a character buffer; this is followed by the formatting string.Rather than writing the formatted result in standard output, sprintfstores itin szBuffer.
Withsprintf, you still have to worry about that
and you alsohave a new worry: the character buffer you define must be large enough for theresult. A Microsoftspecific function named _snprintfsolves this problem byintroducing another argument that indicates the size of the buffer incharacters.
A variationof sprintfis vsprintf, which has onlythree arguments. The vsprintffunction is used to implement a function of yourown that must perform printf-like formatting of a variable number of arguments.The first two arguments to vsprintfare the same as sprintf: the characterbuffer for storing the result and the formatting string. The third argument isa pointer to an array of arguments to be formatted. In practice, this pointeractually
referencesvariables that have been stored on the stack in preparation for a functioncall. The va_list, va_start, and va_endmacros (defined in STDARG.H) help inworking with this stack pointer. The SCRNSIZE program at the end of thischapter demonstrates how to use these macros. The sprintffunction can bewritten in terms of vsprintf like so:
int sprintf (char * szBuffer, constchar * szFormat, ...)
{
int iReturn ;
va_list pArgs ;
va_start (pArgs, szFormat) ;
iReturn = vsprintf (szBuffer,szFormat, pArgs) ;
va_end (pArgs) ;
return iReturn ;
}
The va_startmacro sets pArgto point to the variable onthe stack right above the szFormat argument on the stack.
Of course,with the introduction of wide characters, the sprintffunctions blossomed innumber, creating a thoroughly confusing jumble of function names. Here's achart that shows all the sprintffunctions supported by Microsoft's C run-timelibrary and by Windows.
ASCII
Wide-Character
Generic
Variable Number
of Arguments
StandardVersion
sprintf
swprintf
_stprintf
Max-LengthVersion
_snprintf
_snwprintf
_sntprintf
WindowsVersion
wsprintfA
wsprintfW
wsprintf
Pointer to Array
of Arguments
StandardVersion
vsprintf
vswprintf
_vstprintf
Max-LengthVersion
_vsnprintf
_vsnwprintf
_vsntprintf
WindowsVersion
wvsprintfA
wvsprintfW
wvsprintf
A Formatting Message Box
/*-----------------------------------------------------SCRNSIZE.C-- Displays screen size in a message box
(c) Charles Petzold, 1998
-----------------------------------------------------*/
#include <windows.h>
#include <tchar.h>
#include <stdio.h>
int CDECL MessageBoxPrintf (TCHAR *szCaption, TCHAR * szFormat, ...)
{
TCHAR szBuffer [1024] ;
va_list pArgList ;
// The va_start macro (defined inSTDARG.H) is usually equivalent to:
// pArgList = (char *) &szFormat+ sizeof (szFormat) ;
va_start (pArgList, szFormat) ;
// The last argument to wvsprintfpoints to the arguments
_vsntprintf (szBuffer, sizeof(szBuffer) / sizeof (TCHAR),
szFormat, pArgList) ;
// The va_end macro just zeroes outpArgList for no good reason
va_end (pArgList) ;
return MessageBox (NULL, szBuffer,szCaption, 0) ;
}
int WINAPI WinMain (HINSTANCEhInstance, HINSTANCE hPrevInstance,
PSTR szCmdLine, int iCmdShow)
{
int cxScreen, cyScreen ;
cxScreen = GetSystemMetrics(SM_CXSCREEN) ;
cyScreen = GetSystemMetrics(SM_CYSCREEN) ;
MessageBoxPrintf (TEXT("ScrnSize"),
TEXT ("The screen is %i pixelswide by %i pixels high."),
cxScreen, cyScreen) ;
return 0 ;
}
The programdisplays the width and height of the video display in pixels by usinginformation obtained from the GetSystemMetricsfunction.GetSystemMetrics is a useful function forobtaining information about the sizes of various objects in Windows.Indeed, in Chapter 4I'll use the GetSystemMetricsfunction to show you how to displayand scroll multiple lines of text in a Windows window.
Internationalization and This Book
Preparingyour Windows programs for an international market involves more than usingUnicode. Internationalization is beyond the scope of this book but is coveredextensively inDeveloping International Software for Windows 95 andWindows NTby Nadine Kano (Microsoft Press, 1995).