unicode to ansi char

I need to make a detour for a few moments, and discuss how to handle strings in COM code. If you are familiar with how Unicode and ANSI strings work, and know how to convert between the two, then you can skip this section. Otherwise, read on.

Whenever a COM method returns a string, that string will be in Unicode. (Well, all methods that are written to the COM spec, that is!) Unicode is a character encoding scheme, like ASCII, only all characters are 2 bytes long. If you want to get the string into a more manageable state, you should convert it to a TCHAR string.

TCHAR and the _t functions (for example, _tcscpy()) are designed to let you handle Unicode and ANSI strings with the same source code. In most cases, you'll be writing code that uses ANSI strings and the ANSI Windows APIs, so for the rest of this article, I will refer to chars instead of TCHARs, just for simplicity. You should definitely read up on the TCHAR types, though, to be aware of them in case you ever come across them in code written by others.

When you get a Unicode string back from a COM method, you can convert it to a char string in one of several ways:

  1. Call the WideCharToMultiByte() API.
  2. Call the CRT function wcstombs().
  3. Use the CString constructor or assignment operator (MFC only).
  4. Use an ATL string conversion macro.

WideCharToMultiByte()

You can convert a Unicode string to an ANSI string with the WideCharToMultiByte() API. This API's prototype is:

int WideCharToMultiByte (
    UINT    CodePage,
    DWORD   dwFlags,
    LPCWSTR lpWideCharStr,
    int     cchWideChar,
    LPSTR   lpMultiByteStr,
    int     cbMultiByte,
    LPCSTR  lpDefaultChar,
    LPBOOL  lpUsedDefaultChar );

The parameters are:

CodePage
The code page to convert the Unicode characters into. You can pass  CP_ACP to use the current ANSI code page. Code pages are sets of 256 characters. Characters 0-127 are always identical to the ASCII encoding. Characters 128-255 differ, and can contain graphics or letters with diacritics. Each language or region has its own code page, so it's important to use the right code page to get proper display of accented characters.
dwFlags
dwFlags determine how Windows deals with " composite" Unicode characters, which are a letter followed by a diacritic. An example of a  composite character is  è. If this character is in the code page specified in  CodePage, then nothing special happens. However, if it is  not in the code page, Windows has to convert it to something else.
Passing  WC_COMPOSITECHECK makes the API check for non-mapping  composite characters. Passing  WC_SEPCHARS makes Windows break the character into two, the letter followed by the diacritic, for example  e`. Passing  WC_DISCARDNS makes Windows discard the diacritics. Passing WC_DEFAULTCHAR makes Windows replace the  composite characters with a "default" character, specified in the  lpDefaultChar parameter. The default behavior is  WC_SEPCHARS.
lpWideCharStr
The Unicode string to convert.
cchWideChar
The length of  lpWideCharStr in Unicode characters. You will usually pass -1, which indicates that the string is zero-terminated.
lpMultiByteStr
char buffer that will hold the converted string.
cbMultiByte
The size of  lpMultiByteStr, in bytes.
lpDefaultChar
Optional - a one-character ANSI string that contains the "default" character to be inserted when dwFlags contains  WC_COMPOSITECHECK | WC_DEFAULTCHAR and a Unicode character cannot be mapped to an equivalent ANSI character. You can pass NULL to have the API use a system default character (which as of this writing is a question mark).
lpUsedDefaultChar
Optional - a pointer to a  BOOL that will be set to indicate if the default char was ever inserted into the ANSI string. You can pass NULL if you don't care about this information.

Whew, a lot of boring details! Like always, the docs make it seem much more complicated than it really is. Here's an example showing how to use the API:

// Assuming we already have a Unicode string wszSomeString...
char szANSIString [MAX_PATH];

    WideCharToMultiByte ( CP_ACP,                // ANSI code page
                          WC_COMPOSITECHECK,     // Check for accented characters
                          wszSomeString,         // Source Unicode string
                          -1,                    // -1 means string is zero-terminated
                          szANSIString,          // Destination char string
                          sizeof(szANSIString),  // Size of buffer
                          NULL,                  // No default character
                          NULL );                // Don't care about this flag

After this call, szANSIString will contain the ANSI version of the Unicode string.

wcstombs()

The CRT function wcstombs() is a bit simpler, but it just ends up calling WideCharToMultiByte(), so in the end the results are the same. The prototype for wcstombs() is:

size_t wcstombs (
    char*          mbstr,
    const wchar_t* wcstr,
    size_t         count );

The parameters are:

mbstr
char buffer to hold the resulting ANSI string.
wcstr
The Unicode string to convert.
count
The size of the  mbstr buffer, in bytes.

wcstombs() uses the WC_COMPOSITECHECK | WC_SEPCHARS flags in its call toWideCharToMultiByte(). To reuse the earlier example, you can convert a Unicode string with code like this:

    wcstombs ( szANSIString, wszSomeString, sizeof(szANSIString) );

CString

The MFC CString class contains constructors and assignment operators that accept Unicode strings, so you can let CString do the conversion work for you. For example:

// Assuming we already have wszSomeString...

CString str1 ( wszSomeString );    // Convert with a constructor.
CString str2;

    str2 = wszSomeString;          // Convert with an assignment operator.

ATL macros

ATL has a handy set of macros for converting strings. To convert a Unicode string to ANSI, use theW2A() macro (a mnemonic for "wide to ANSI"). Actually, to be more accurate, you should useOLE2A(), where the "OLE" indicates the string came from a COM or OLE source. Anyway, here's an example of how to use these macros.

#include <atlconv.h>

// Again assuming we have wszSomeString...

{
char szANSIString [MAX_PATH];
USES_CONVERSION;  // Declare local variable used by the macros.

    lstrcpy ( szANSIString, OLE2A(wszSomeString) );
}

The OLE2A() macro "returns" a pointer to the converted string, but the converted string is stored in a temporary stack variable, so we need to make our own copy of it with lstrcpy(). Other macros you should look into are W2T() (Unicode to TCHAR), and W2CT() (Unicode string to const TCHAR string).

There is an OLE2CA() macro (Unicode string to a const char string) which we could've used in the code snippet above. OLE2CA() is actually the correct macro for that situation, since the second parameter to lstrcpy() is a const char*, but I didn't want to throw too much at you at once.

Sticking with Unicode

On the other hand, you can just keep the string in Unicode if you won't be doing anythingcomplicated with the string. If you're writing a console app, you can print Unicode strings with thestd::wcout global variable, for example:

    wcout << wszSomeString;

But keep in mind that wcout expects all strings to be in Unicode, so if you have any "normal" strings, you'll still need to output them with std::cout. If you have string literals, prefix them with L to make them Unicode, for example:

    wcout << L"The Oracle says..." << endl << wszOracleResponse;

If you keep a string in Unicode, there are a couple of restrictions:

  • You must use the wcsXXX() string functions, such as wcslen(), on Unicode strings.
  • With very few exceptions, you cannot pass a Unicode string to a Windows API on Windows 9x. To write code that will run on 9x and NT unchanged, you'll need to use the TCHAR types, as described in MSDN.
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。 经导师精心指导并认可、获 98 分的毕业设计项目!【项目资源】:微信小程序。【项目说明】:聚焦计算机相关专业毕设及实战操练,可作课程设计与期末大作业,含全部源码,能直用于毕设,经严格调试,运行有保障!【项目服务】:有任何使用上的问题,欢迎随时与博主沟通,博主会及时解答。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值