WideCharToMultiByte和MultiByteToWideChar函数的用法

最新推荐文章于 2021-12-28 10:26:26 发布

maweiqi

最新推荐文章于 2021-12-28 10:26:26 发布

阅读量2.9k

点赞数

分类专栏： VC++ 转载博文

转载博文同时被 2 个专栏收录

21 篇文章 0 订阅

订阅专栏

VC++

5 篇文章 0 订阅

订阅专栏

为了支持Unicode编码，需要多字节与宽字节之间的相互转换。这两个系统函数在使用时需要指定代码页，在实际应用过程中遇到乱码问题，然后重新阅读《Windows核心编程》，总结出正确的用法。
WideCharToMultiByte的代码页参数用来标记目的字符串相关的代码页。
MultiByteToWideChar的代码页参数用来标记源多字节字符串相关的代码页。
常用的代码页由CP_ACP（或CP_OEMCP）和CP_UTF8两个。
使用CP_ACP代码页就实现了ANSI与Unicode之间的转换。
使用CP_UTF8代码页就实现了UTF-8与Unicode之间的转换。
下面是代码实现：
1. ANSI to Unicode
wstring ANSIToUnicode( const string& str )
{
int len = 0;
len = str.length();
int unicodeLen = ::MultiByteToWideChar( CP_ACP,
            0,
            str.c_str(),
            -1,
            NULL,
            0 );
wchar_t * pUnicode;
pUnicode = new wchar_t[unicodeLen+1];
memset(pUnicode,0,(unicodeLen+1)*sizeof(wchar_t));
::MultiByteToWideChar( CP_ACP,
         0,
         str.c_str(),
         -1,
         (LPWSTR)pUnicode,
         unicodeLen );
wstring rt;
rt = ( wchar_t* )pUnicode;
delete pUnicode;

return rt;
}
2. Unicode to ANSI
string UnicodeToANSI( const wstring& str )
{
char*     pElementText;
int    iTextLen;
// wide char to multi char
iTextLen = WideCharToMultiByte( CP_ACP,
         0,
         str.c_str(),
         -1,
         NULL,
         0,
NULL,
         NULL );
pElementText = new char[iTextLen + 1];
memset( ( void* )pElementText, 0, sizeof( char ) * ( iTextLen + 1 ) );
::WideCharToMultiByte( CP_ACP,
         0,
         str.c_str(),
         -1,
         pElementText,
         iTextLen,
         NULL,
         NULL );
string strText;
strText = pElementText;
delete[] pElementText;
return strText;
}
3. UTF-8 to Unicode
wstring UTF8ToUnicode( const string& str )
{
int len = 0;
len = str.length();
int unicodeLen = ::MultiByteToWideChar( CP_UTF8,
            0,
            str.c_str(),
            -1,
            NULL,
            0 );
wchar_t * pUnicode;
pUnicode = new wchar_t[unicodeLen+1];
memset(pUnicode,0,(unicodeLen+1)*sizeof(wchar_t));
::MultiByteToWideChar( CP_UTF8,
         0,
         str.c_str(),
         -1,
         (LPWSTR)pUnicode,
         unicodeLen );
wstring rt;
rt = ( wchar_t* )pUnicode;
delete pUnicode;

return rt;
}
4. Unicode to UTF-8
string UnicodeToUTF8( const wstring& str )
{
char*     pElementText;
int    iTextLen;
// wide char to multi char
iTextLen = WideCharToMultiByte( CP_UTF8,
         0,
         str.c_str(),
         -1,
         NULL,
         0,
         NULL,
         NULL );
pElementText = new char[iTextLen + 1];
memset( ( void* )pElementText, 0, sizeof( char ) * ( iTextLen + 1 ) );
::WideCharToMultiByte( CP_UTF8,
         0,
         str.c_str(),
         -1,
         pElementText,
         iTextLen,
         NULL,
         NULL );
string strText;
strText = pElementText;
delete[] pElementText;
return strText;

}

附：CP_ACP和CP_OEMCP的关系

CP_ACP和CP_OEMCP，分别是指当前计算机操作系统的Windows代码页与OEM代码页。对于东亚的简体中文、繁体中文、日文、韩文等Win操作系统语言环境，这两种代码页是同一个，如简体中文是代码页936即GB2312字符集，繁体中文是950即大五码字符集，韩文是949、日文是932。对于西方国家的拼音文字语言设置，两个代码页不同。典型的如English_US，其Windows代码页是1252、OEM代码页是437，还有第三个代码页ISO-8859-1又称Latin-1或“西欧语言”，是针对英语法语西语德语等西欧语言的扩展ASCII字符集。这三者（1252、437、8859-1）都是针对英语但并不相同。

为什么会有Windows代码页与OEM代码页的区别呢？因为在八十年代DOS系统时期，还是“字符终端”的屏幕只能够显示的256个字符，这些字符的字形的点阵信息存储在硬件的ROM中。DOS操作系统通过系统中断调用驱动程序把这些字形读出来写入显存。这是由OEM负责字符集中有哪些字符，显示时为什么字形的时代，而且一台PC上只有这么一套字符集/字形，没得选，除非你再差一个带字库的“汉卡”。进入了微软的Windows操作系统时代之后，由于硬件的发展，操作系统有了自己的字形文件，绘制字符时不再真地去读ROM，而是用字形文件（就是字体fonts文件）来把字符的形状写入显存。可以选择用哪种字形：如有衬线的Times NewRome，还是无衬线的Sans Serif。操作系统默认使用的字符集，就由微软来定义了，如English_US使用Codepage1252；简体中文使用Codepage936（即国标2312）. 至于那个OEM436，就是legacy，用于向后兼容。

综上，就这么点事。CP_ACP和CP_OEMCP，分别是UINT的0和1。在WinNls.h中的注释说明分别是“default to ANSI code page”，“default to OEM code page”。所以，在简体中文Windows，这两个宏表示的都是代码页936.

maweiqi

关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
3
评论
WideCharToMultiByte和MultiByteToWideChar函数的用法

为了支持Unicode编码，需要多字节与宽字节之间的相互转换。这两个系统函数在使用时需要指定代码页，在实际应用过程中遇到乱码问题，然后重新阅读《Windows核心编程》，总结出正确的用法。WideCharToMultiByte的代码页参数用来标记目的字符串相关的代码页。MultiByteToWideChar的代码页参数用来标记源多字节字符串相关的代码页。常用的代码页由CP_ACP（或C
复制链接

扫一扫