《Windows Via C/C++》边学习，边翻译（七）操作字符和字符串-6

最新推荐文章于 2024-07-13 21:16:00 发布

Direwolf

最新推荐文章于 2024-07-13 21:16:00 发布

阅读量1.3k

点赞数

分类专栏： Windows Via C/C++ 文章标签： windows character function buffer string integer

Windows Via C/C++ 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

Translating Strings Between Unicode and ANSI

在Unicode和ANSI之间转换字符串

You use the Windows function MultiByteToWideChar to convert multibyte-character strings to wide-character strings. MultiByteToWideChar is shown here:

可以使用Windows的MultiByteToWideChar函数将多字节字符串转化为宽字符串。MultiByteToWideChar函数如下：

int MultiByteToWideChar(

   UINT uCodePage,

   DWORD dwFlags,

   PCSTR pMultiByteStr,

   int cbMultiByte,

   PWSTR pWideCharStr,

   int cchWideChar);

The uCodePage parameter identifies a code page number that is associated with the multibyte string. The dwFlags parameter allows you to specify an additional control that affects characters with diacritical marks such as accents. Usually the flags aren't used, and 0 is passed in the dwFlags parameter (For more details about the possible values for this flag, read the MSDN online help at http://msdn2.microsoft.com/en-us/library/ms776413.aspx.) The pMultiByteStr parameter specifies the string to be converted, and the cbMultiByte parameter indicates the length (in bytes) of the string. The function automatically determines the length of the source string if you pass -1 for the cbMultiByte parameter.

参数uCodePage确定与多字节字符串相关的代码页值。参数dwFlags允许指定附加控制，它影响字符的区分标记，如重音符号。通常不使用该标记，而把0传给它（想了解此标记的可能取值，请参考MSDN在线帮助http://msdn2.microsoft.com/en-us/library/ms776413.aspx）。参数pMultiByteStr指定要转换的字符串，参数cbMultiByte指明串长（字节数）。如果给cbMultiByte传-1的话，函数会自动检查源字符串的长度。

The Unicode version of the string resulting from the conversion is written to the buffer located in memory at the address specified by the pWideCharStr parameter. You must specify the maximum size of this buffer (in characters) in the cchWideChar parameter. If you call MultiByteToWideChar, passing 0 for the cchWideChar parameter, the function doesn't perform the conversion and instead returns the number of wide characters (including the terminating '/0' character) that the buffer must provide for the conversion to succeed. Typically, you convert a multibyte-character string to its Unicode equivalent by performing the following steps:

经过转换后得到的Unicode版本的字符串被写到由参数pWideCharStr所指定的内存地址。必须通过参数cchWideChar指定此buffer的最大尺寸（字节）。如果在调用函数MultiByteToWideChar时给参数cchWideChar传入0，函数并不执行转换，而是返回确保转换成功应提供的buffer的宽字符数（包括'/0'结束符）。通常通过以下步骤将一个多字节字符串转换成它等价的Unicode版本。

1. Call MultiByteToWideChar, passing NULL for the pWideCharStr parameter and 0 for the cchWideChar parameter and -1 for the cbMultiByte parameter.

调用函数MultiByteToWideChar，将NULL传给参数pWideCharStr，将0传给参数cchWideChar，将-1传给参数cbMultiByte。

2. Allocate a block of memory large enough to hold the converted Unicode string. This size is computed based on the value returned by the previous call to MultiByteToWideChar multiplied by sizeof(wchar_t).

分配一个足够容纳转换后的Unicode字符串的内存空间，它的大小由以下计算得出：用前一次调用函数MultiByteToWideChar的返回值乘以sizeof(wchar_t)。

3. Call MultiByteToWideChar again, this time passing the address of the buffer as the pWideCharStr parameter and passing the size computed based on the value returned by the first call to MultiByteToWideChar multiplied by sizeof(wchar_t) as the cchWideChar parameter.

再次调用MultiByteToWideChar，这一次把buffer地址传给参数pWideCharStr，并把由第一次调用时返回的值与sizeof(wchar_t)相乘得出的size，传给参数cchWideChar。

4. Use the converted string.

使用转换好的字符串。

5. Free the memory block occupying the Unicode string.

释放Unicode字符串占用的内存空间。

The function WideCharToMultiByte converts a wide-character string to its multibyte-string equivalent, as shown here:

函数WideCharToMultiByte把宽字符串转化成它等价的多字节字符串，如下所示：

int WideCharToMultiByte(

   UINT uCodePage,

   DWORD dwFlags,

   PCWSTR pWideCharStr,

   int cchWideChar,

   PSTR pMultiByteStr,

   int cbMultiByte,

   PCSTR pDefaultChar,

   PBOOL pfUsedDefaultChar);

This function is similar to the MultiByteToWideChar function. Again, the uCodePage parameter identifies the code page to be associated with the newly converted string. The dwFlags parameter allows you to specify additional control over the conversion. The flags affect characters with diacritical marks and characters that the system is unable to convert. Usually, you won't need this degree of control over the conversion, and you'll pass 0 for the dwFlags parameter.

此函数与MultiByteToWideChar相似。参数uCodePage仍然确定与新的待转换字符串相关的代码页。参数dwFlags允许指定转换的附加控制。参数flags影响字符区分标记和系统不能转换的字符。通常，向参数dwFlags传递0。

The pWideCharStr parameter specifies the address in memory of the string to be converted, and the cchWideChar parameter indicates the length (in characters) of this string. The function determines the length of the source string if you pass -1 for the cchWideChar parameter.

参数pWideCharStr制定待转换字符串的内存地址，参数cchWideChar指明串长（字符数）。如果给pWideCharStr传入-1的话，函数会检测源字符串的长度。

The multibyte version of the string resulting from the conversion is written to the buffer indicated by the pMultiByteStr parameter. You must specify the maximum size of this buffer (in bytes) in the cbMultiByte parameter. Passing 0 as the cbMultiByte parameter of the WideCharToMultiByte function causes the function to return the size required by the destination buffer. You'll typically convert a wide-character string to a multibyte-character string using a sequence of events similar to those discussed when converting a multibyte string to a wide-character string, except that the return value is directly the number of bytes required for the conversion to succeed.

转换得到的多字节版本的字符串写入由参数pMultiByteStr所指定的buffer中。你需要通过参数pMultiByteStr指定此buffer的size（字节数）。给函数WideCharToMultiByte的参数cbMultiByte传0，会使函数返回目的buffer的size。在将一个宽字符串转换成多字节字符串的典型做法，除了返回转换所需的buffer大小是直接的字节数之外，与前面讨论的将多字节字符串转换成宽字符串的一系列做法是相似的。

You'll notice that the WideCharToMultiByte function accepts two parameters more than the MultiByteToWideChar function: pDefaultChar and pfUsedDefaultChar. These parameters are used by the WideCharToMultiByte function only if it comes across a wide character that doesn't have a representation in the code page identified by the uCodePage parameter. If the wide character cannot be converted, the function uses the character pointed to by the pDefaultChar parameter. If this parameter is NULL, which is most common, the function uses a system default character. This default character is usually a question mark. This is dangerous for filenames because the question mark is a wildcard character.

你可能会注意到，函数WideCharToMultiByte比函数MultiByteToWideChar多两个参数：pDefaultChar和pfUsedDefaultChar。这两个参数只有当函数WideCharToMultiByte遇到在参数uCodePage指定的代码页中无表示的宽字符时才会用到。如果宽字符无法转换，函数会使用由参数pDefaultChar指定的字符，如果这个参数是NULL（这也是通常的做法），函数就使用系统默认字符，此默认字符通常是问号。问号符对文件名来说是危险的，因为它是通配符。

The pfUsedDefaultChar parameter points to a Boolean variable that the function sets to TRUE if at least one character in the wide-character string could not be converted to its multibyte equivalent. The function sets the variable to FALSE if all the characters convert successfully. You can test this variable after the function returns to check whether the wide-character string was converted successfully. Again, you usually pass NULL for this parameter.

参数pfUsedDefaultChar指向一个布尔变量，在宽字符串中只要有一个字符无法被转换为等价的多字节字符，函数就把它置为TRUE。而如果所有字符都转换成功的话，函数把该变量置为FALSE。你可以在函数返回后检查此变量，以确定宽字符串是否转换成功了。通常仍是给该参数传入NULL。

For a more complete description of how to use these functions, please refer to the Platform SDK documentation.

想得到使用此函数的详细信息，请参考SDK文档。

Exporting ANSI and Unicode DLL Functions

输出ANSI和Unicode DLL函数

You could use these two functions to easily create both Unicode and ANSI versions of functions. For example, you might have a dynamic-link library containing a function that reverses all the characters in a string. You could write the Unicode version of the function as shown here:

你可以用这两个函数方便地创建Unicode和ANSI版本的函数。例如，你可能有一个动态连接库，其中有一个函数要翻转字符串中的所有字符。你可以写出以下的Unicode版本的函数：

BOOL StringReverseW(PWSTR pWideCharStr, DWORD cchLength) {



   // Get a pointer to the last character in the string.

   PWSTR pEndOfStr = pWideCharStr + wcsnlen_s(pWideCharStr , cchLength) - 1;

   wchar_t cCharT;

   // Repeat until we reach the center character in the string.

   while (pWideCharStr < pEndOfStr) {

      // Save a character in a temporary variable.

   cCharT = *pWideCharStr;



      // Put the last character in the first character.

      *pWideCharStr = *pEndOfStr;



      // Put the temporary character in the last character.

      *pEndOfStr = cCharT;



      // Move in one character from the left.

      pWideCharStr++;



      // Move in one character from the right.

      pEndOfStr--;

   }



   // The string is reversed; return success.

   return(TRUE);

}

And you could write the ANSI version of the function so that it doesn't perform the actual work of reversing the string at all. Instead, you could write the ANSI version so that it converts the ANSI string to Unicode, passes the Unicode string to the StringReverseW function, and then converts the reversed string back to ANSI. The function would look like this:

你可以写出函数的ANSI版本，但并没有做转换字符串的实际操作。取代它的做法是，编写函数先将ANSI字符串转换成Unicode字符串，然后把Unicode字符串传给函数StringReverseW，再把经过翻转的字符串转换回ANSI字符串。这个函数如下所示：

BOOL StringReverseA(PSTR pMultiByteStr, DWORD cchLength) {

   PWSTR pWideCharStr;

   int nLenOfWideCharStr;

   BOOL fOk = FALSE;



   // Calculate the number of characters needed to hold

   // the wide-character version of the string.

   nLenOfWideCharStr = MultiByteToWideChar(CP_ACP, 0,

      pMultiByteStr, cchLength, NULL, 0);



   // Allocate memory from the process' default heap to

   // accommodate the size of the wide-character string.

   // Don't forget that MultiByteToWideChar returns the

   // number of characters, not the number of bytes, so

   // you must multiply by the size of a wide character.

   pWideCharStr = (PWSTR)HeapAlloc(GetProcessHeap(), 0,

      nLenOfWideCharStr * sizeof(wchar_t));



   if (pWideCharStr == NULL)

      return(fOk);



   // Convert the multibyte string to a wide-character string.

   MultiByteToWideChar(CP_ACP, 0, pMultiByteStr, cchLength,

      pWideCharStr, nLenOfWideCharStr);



   // Call the wide-character version of this

   // function to do the actual work.

   fOk = StringReverseW(pWideCharStr, cchLength);



   if (fOk) {

       // Convert the wide-character string back

      // to a multibyte string.

      WideCharToMultiByte(CP_ACP, 0, pWideCharStr, cchLength,

         pMultiByteStr, (int)strlen(pMultiByteStr), NULL, NULL);

   }



   // Free the memory containing the wide-character string.

   HeapFree(GetProcessHeap(), 0, pWideCharStr);



   return(fOk);

}

Finally, in the header file that you distribute with the dynamic-link library, you prototype the two functions as follows:

最后，在头文件中引入(distribute)动态连接库，函数原型如下：

BOOL StringReverseW(PWSTR pWideCharStr, DWORD cchLength);

BOOL StringReverseA(PSTR pMultiByteStr, DWORD cchLength);



#ifdef UNICODE

#define StringReverse StringReverseW

#else

#define StringReverse StringReverseA

#endif // !UNICODE

Determining If Text Is ANSI or Unicode

确定文本是ANSI还是Unicode编码

The Windows Notepad application allows you to open both Unicode and ANSI files as well as create them. In fact, Figure 2-5 shows Notepad's File Save As dialog box. Notice the different ways that you can save a text file.

Windows的Notepad应用程序允许打开和创建Unicode和ANSI编码的文件。图2-5是Notepad的文件存储对话框。注意你可以采用不同方式保存文本文件。 Figure 2-5: The Windows Vista Notepad File Save As dialog box

For many applications that open text files and process them, such as compilers, it would be convenient if, after opening a file, the application could determine whether the text file contained ANSI characters or Unicode characters. The IsTextUnicode function exported by AdvApi32.dll and declared in WinBase.h can help make this distinction:

对于大多数应用程序，如编译器，如果能够在打开后确定文件包含的是ANSI字符还是Unicode字符，那么打开和处理都会很方便。在WinBase.h中声明，AdvApi32.dll中输出的IsTextUnicode函数可以达到此目的：

BOOL IsTextUnicode(CONST PVOID pvBuffer, int cb, PINT pResult);

The problem with text files is that there are no hard and fast rules as to their content. This makes it extremely difficult to determine whether the file contains ANSI or Unicode characters. IsTextUnicode uses a series of statistical and deterministic methods to guess at the content of the buffer. Because this is not an exact science, it is possible that IsTextUnicode will return an incorrect result.

关于文本文件的问题是，并没有固定快速的规则（来确定）其内容。这使得确定其包含的是ANSI字符还是Unicode字符非常困难。IsTextUnicode采用一系列统计和决策方法来猜测buffer中的内容。由于并非精确的科学，所以函数IsTextUnicode有可能返回错误的结果。

The first parameter, pvBuffer, identifies the address of a buffer that you want to test. The data is a void pointer because you don't know whether you have an array of ANSI characters or an array of Unicode characters.

第一个参数pvBuffer，指明你想检查的buffer的地址。此数据是一个void指针，因为你不知道（它所指的）是ANSI字符还是Unicode字符的数组。

The second parameter, cb, specifies the number of bytes that pvBuffer points to. Again, because you don't know what's in the buffer, cb is a count of bytes rather than a count of characters. Note that you do not have to specify the entire length of the buffer. Of course, the more bytes IsTextUnicode can test, the more accurate a response you're likely to get.

第二个参数cb指定了pvBuffer所指向的buffer的字节数。由于你仍然不知道buffer中有些什么，所以cb是字节计数而不是字符计数。注意，你不用非要指定整个buffer的长度。当然，函数IsTextUnicode检查的字节更多，你就可能得到更精确的结果。

The third parameter, pResult, is the address of an integer that you must initialize before calling IsTextUnicode. You initialize this integer to indicate which tests you want IsTextUnicode to perform. You can also pass NULL for this parameter, in which case IsTextUnicode will perform every test it can. (See the Platform SDK documentation for more details.)

第三个参数pResult，是在调用IsTextUnicode之前必须初始化的一个整数的地址。对此整数进行初始化，指明你想让函数执行哪种检查。也可以向此参数传递NULL，这种情况下函数IsTextUnicode会执行所有能做的检查。（参看SDK文档获得更多细节。）

If IsTextUnicode thinks that the buffer contains Unicode text, TRUE is returned; otherwise, FALSE is returned. If specific tests were requested in the integer pointed to by the pResult parameter, the function sets the bits in the integer before returning to reflect the results of each test.

如果函数IsTextUnicode认为buffer包含的是Unicode文本，返回TRUE；否则，返回FALSE。如果由参数pResult所指向的整数来指定检查方式，函数会在每种检查之后将结果设置到此整数对应的位，然后再返回。

The FileRev sample application presented in Chapter 17, "Memory-Mapped Files," demonstrates the use of the IsTextUnicode function.

第17章“Memory-Mapped Files”的示例程序FileRev，显示了如何使用函数IsTextUnicode。

Direwolf

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
《Windows Via C/C++》边学习，边翻译（七）操作字符和字符串-6

Translating Strings Between Unicode and ANSI在Unicode和ANSI之间转换字符串You use the Windows function MultiByteToWideChar to convert multibyte-character strings to wide-character strings. MultiByteToWideCh
复制链接

扫一扫