windows核心编程--字符集

最新推荐文章于 2022-12-08 18:46:44 发布

原创最新推荐文章于 2022-12-08 18:46:44 发布 · 1k 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#windows #编程 #character #c #string #测试

windows核心编程笔记专栏收录该内容

16 篇文章

订阅专栏

本文介绍了不同字符集的概念，如单字节字符集、双字节字符集和Unicode字符集，并详细阐述了如何使用Windows API函数MultiByteToWideChar与WideCharToMultiByte在多字节与Unicode字符串间进行转换。

1 软件的本地化要解决的真正问题，实际上就是如何来处理不同的字符集。以前我们习惯与用单字节字符集来编程.
2 单字符集:将文本串作为一系列单字节字符来进行编码，并在结尾处放上一个零。(每个字符用一个字节来表示)
3 双字节字符集（D B C S ):在双字节字符集中，字符串中的每个字符可以包含一个字节或包含两个字节。
4 unicode字符集:U n i c o d e 提供了一种简单而又一致的表示字符串的方法。U n i c o d e 字符串中的所有字符都是1 6 位的（两个字节）。
5 当M i c r o s o f t 公司将C O M 从1 6 位Wi n d o w s 转换成Wi n 3 2 时，公司作出了一个决定，即需要字符串的所有C O M 接口方法都只能接受U n i c o d e 字符串。
6 c运行期库支持unicode,即使是windows98也支持.
7 Windows 2000 的N o t e p a d (记事本)应用程序允许你既能打开U n i c o d e 文件，也能打开A N S I 文件，并且可以创建这些文件。
8 I s Te x t U n i c o d e 函数能够帮助进行区分ANSIC字符和unicode：
DWORD IsTextUnicode(CONST PVOID pvBuffer, int cb,PINT pResult);

第一个参数p v B u ff e r 用于标识要测试的缓存的地址。该数据是个无效指针，因为你不知道你拥有的是A N S I 字符数组还是U n i c o d e
字符数组。

第二个参数c b 用于设定p v B u ff e r 指向的字节数。同样，由于你不知道缓存中放的是什么，因此c b 是个字节数，而不是字符数。请注意，不必设定缓存的整个长度。当然，I s Te x t U n i c o d e能够测试的字节越多，得到的结果越准确。

第三个参数p R e s u l t 是个整数的地址，必须在调用I s Te x t U n i c o d e 之前对它进行初始化。对该整数进行初始化后，就可以指明你要I s Te x t U n i c o d e 执行哪些测试。也可以为该参数传递N U L L ，在这种情况下，I s Te x t U n i c o d e 将执行它能够进行的所有测试（详细说明请参见Platform SDK 文档）。

9 对D B C S 字符串进行操作的帮助函数

函数	描述
PTSTR CharNext(PCTSTR pszCurrentChar);	返回字符串中的下一个字符的地址
PTSTR CharPrev (PCTSTR pszStart,PCTSTR p s z C u r r e n t C h a r);	返回字符串中的上一个字符的地址
BOOL IsDBCSLeadByteTRUE(BYTE bTestChar);	如果该字节是DBCS 字符的第一个字节，则返回

10 “M i c r o s o f t 公司对U n i c o d e 支持的情况”：

• Windows 2000 既支持U n i c o d e ，也支持A N S I ，因此可以为任意一种开发应用程序。

• Windows 98 只支持A N S I ，只能为A N S I 开发应用程序。

• Windows CE 只支持U n i c o d e ，只能为U n i c o d e 开发应用程序。

11 Wi n d o w s 头文件定义de Uincode 数据类型

数据类型	说明
W C H A R	U n i c o d e 字符
P W S T R	指向U n i c o d e 字符串的指针
P C W S T R	指向一个恒定的U n i c o d e 字符串的指针

使用实例如下:

#ifdef UNICODE
#define CreateWindowEx CreateWindowExW
#else
#define CreateWindowEx CreateWindowExA
#endif //!UNICODE

在Unicode与ANSI之间转换字符串

Wi n d o w s 函数M u l t i B y t e To Wi d e C h a r 用于将多字节字符串转换成宽字符串。下面显示了M u l t i B y t e To Wi d e C h a r 函数。

int MultiByteToWideChar(
UINT CodePage,          //code page
DWORD dwFlags,          //character-type options
LPCSTR lpMultiByteStr,  //address of string to map
int cchMultiByte,       //number of bytes in string
LPWSTR lpWideCharStr,   //address of wide-character buffer
int cchWideChar         //size of buffer
);

u C o d e P a g e 参数用于标识一个与多字节字符串相关的代码页号。d w F l a g s 参数用于设定另一个控件，它可以用重音符号之类的区分标记来影响字符。这些标志通常并不使用，在d w F l a g s参数中传递0 。p M u l t i B y t e S t r 参数用于设定要转换的字符串，c c h M u l t i B y t e 参数用于指明该字符串的长度（按字节计算）。如果为c c h M u l t i B y t e 参数传递- 1 ，那么该函数用于确定源字符串的长度。

转换后产生的U n i c o d e 版本字符串将被写入内存中的缓存，其地址由p Wi d e C h a r S t r 参数指定。必须在c c h Wi d e C h a r 参数中设定该缓存的最大值（以字符为计量单位）。如果调用M u l t i B y t e To Wi d e C h a r ，给c c h Wi d e C h a r 参数传递0 ，那么该参数将不执行字符串的转换，而是返回为使转换取得成功所需要的缓存的值。一般来说，可以通过下列步骤将多字节字符串转换成U n i c o d e 等价字符串：

1) 调用M u l t i B y t e To Wi d e C h a r 函数，为p Wi d e C h a r S t r 参数传递N U L L ，为c c h Wi d e C h a r 参数传递0 。
2) 分配足够的内存块，用于存放转换后的U n i c o d e 字符串。该内存块的大小由前面对M u l t B y t e To Wi d e C h a r 的调用返回。
3) 再次调用M u l t i B y t e To Wi d e C h a r ，这次将缓存的地址作为p Wi d e C h a r S t r 参数来传递，并传递第一次调用M u l t i B y t e To Wi d e C h a r 时返回的缓存大小，作为c c h Wi d e c h a r 参数。
4. 使用转换后的字符串。
5) 释放U n i c o d e 字符串占用的内存块。
函数Wi d e C h a r To M u l t i B y t e 将宽字符串转换成等价的多字节字符串，如下所示：

int WideCharToMultiByte(
UINT CodePage,         // code page
DWORD dwFlags,         // performance and mapping flags
LPCWSTR lpWideCharStr, // address of wide-character string
int cchWideChar,       // number of characters in string
LPSTR lpMultiByteStr,  // address of buffer for new string
int cchMultiByte,      // size of buffer
LPCSTR lpDefaultChar,  // address of default for unmappable
// characters
LPBOOL lpUsedDefaultChar   // address of flag set when default
// char. used
);

该函数与M u l t i B i t e To Wi d e C h a r 函数相似。同样，u C o d e P a g e 参数用于标识与新转换的字符串相关的代码页。d w F l a g s 则设定用于转换的其他控件。这些标志能够作用于带有区分符号的字符和系统不能转换的字符。通常不需要为字符串的转换而拥有这种程度的控制手段，你将为d w F l a g s 参数传递0 。

p Wi d e C h a r S t r 参数用于设定要转换的字符串的内存地址，c c h Wi d e C h a r 参数用于指明该字符串的长度（用字符数来计量）。如果你为c c h Wi d e C h a r 参数传递- 1 ，那么该函数用于确定源字符串的长度。

转换产生的多字节版本的字符串被写入由p M u l t i B y t e S t r 参数指明的缓存。必须在c c h M u l t i B y t e参数中设定该缓存的最大值（用字节来计量）。如果传递0 作为Wi d e C h a r To M u l t i B y t e 函数的c c h M u l t i B y t e 参数，那么该函数将返回目标缓存需要的大小值。通常可以使用将多字节字符串转换成宽字节字符串时介绍的一系列类似的事件，将宽字节字符串转换成多字节字符串。

你会发现，Wi d e C h a r To M u l t i B y t e 函数接受的参数比M u l t i B y t e To Wi d e C h a r 函数要多2 个，即p D e f a u l t C h a r 和p f U s e d D e f a u l t C h a r 。只有当Wi d e C h a r To M u l t i B y t e 函数遇到一个宽字节字符，而该字符在u C o d e P a g e 参数标识的代码页中并没有它的表示法时，Wi d e C h a r To M u l t i B y t e 函数才使用这两个参数。如果宽字节字符不能被转换，该函数便使用p D e f a u l t C h a r 参数指向的字符。如果该参数是N U L L （这是大多数情况下的参数值），那么该函数使用系统的默认字符。该默认字符通常是个问号。这对于文件名来说是危险的，因为问号是个通配符。

p f U s e d D e f a u l t C h a r 参数指向一个布尔变量，如果宽字符串中至少有一个字符不能转换成等价多字节字符，那么函数就将该变量置为T R U E 。如果所有字符均被成功地转换，那么该函数就将该变量置为FA L S E 。当函数返回以便检查宽字节字符串是否被成功地转换后，可以测试该变量。同样，通常为该测试传递N U L L 。

关于如何使用这些函数的详细说明，请参见Platform SDK 文档。

如果使用这两个函数，就可以很容易创建这些函数的U n i c o d e 版本和A N S I 版本。例如，你可能有一个动态链接库，它包含一个函数，能够转换字符串中的所有字符。可以像下面这样编写该函数的U n i c o d e 版本：

BOOL StringReverseW(PWSTR pWideCharStr)
{
//Get a pointer to the last character in the string.
PWSTR pEndOfStr=pWideCharStr+wcslen(pWideCharStr)-1;
wchar_t cCharT;
//Repeat until we reach the center character
//in the string.
while (pWideCharStr < pEndOfStr)
{
//Save a character in a temporary variable.
cCharT=*pWideCharStr;
//Put the last character in the first character.
*pWideCharStr =*pEndOfStr;
//Put the temporary character in the last character.
*pEndOfStr=cCharT;
//Move in one character from the left.
pWideCharStr++;
//Move in one character from the right.
pEndOfStr--;
}
//The string is reversed; return success.
return(TRUE);
}

你可以编写该函数的A N S I 版本以便该函数根本不执行转换字符串的实际操作。你也可以编写该函数的A N S I 版本，以便该函数它将A N S I 字符串转换成U n i c o d e 字符串，将U n i c o d e 字符串传递给S t r i n g R e v e r s e W 函数，然后将转换后的字符串重新转换成A N S I 字符串。该函数类似下面的样子：

BOOL StringReverseA(PSTR pMultiByteStr)
{
PWSTR pWideCharStr;
int nLenOfWideCharStr;
BOOL fOk = FALSE;
//Calculate the number of characters needed to hold
//the wide_character version of string.
nLenOfWideCharStr = MultiRyteToWideChar(CP_ACP, 0,
pMultiByteStr, -1, NULL, 0);
//Allocate memory from the process's default heap to
//accommodate the size of the wide-character string.
//Don't forget that MultiByteToWideChar returns the
//number of characters,not the number of bytes,so
//you must multiply by the size of wide character.
pWideCharStr = HeapAlloc(GetProcessHeap(), 0,
nLenOfWideCharStr * sizeof(WCHAR));
if (pWideCharStr == NULL)
return(fOk);
//Convert the multibyte string to a wide_character string.
MultiByteToWideChar(CP_ACP, 0, pMulti8yteStr, -1,
pWideCharStr, nLenOfWideCharStr);
//Call the wide-character version of this
//function to do the actual work
fOk = StnngReverseW(pWideCharStr);
if (fOk)
{
//Convert the wide-character string back
//to a multibyte string.
WideCharToMultiByte(CP_ACP, 0, pWideCharStr, -1,
pMultiByteStr, strlen(pMultiByteStr), NULL, NULL);
}
//Free the momory containing the wide-character string.
HeapFree(GetProcessHeap(), 0, pWideCharStr);
return(fOk),
}

最后，在用动态链接库分配的头文件中，可以像下面这样建立这两个函数的原型：

BOOL StringReverseW (PWSTR pWideCharStr);
BOOL StringReverseA (PSTR pMultiByteStr);
#ifdef UNICODE
#define StnngReverse StringReverseW
#else
#define StringRevcrsc StringReverseA
#endif // UNICODE