C++11 wstring与utf-8 转换

最新推荐文章于 2024-09-26 09:28:32 发布

sdhongjun

最新推荐文章于 2024-09-26 09:28:32 发布

阅读量7.1k

点赞数

分类专栏： VC and C++

VC and C++ 专栏收录该内容

67 篇文章 1 订阅

订阅专栏

UTF-8转换

/*************************************************************/
/* RFC 3629 defines the mapping as follows :
 *
 * Char. number range  |        UTF-8 octet sequence
 *    (hexadecimal)    |              (binary)
 * --------------------+---------------------------------------------
 * 0000 0000-0000 007F | 0xxxxxxx
 * 0000 0080-0000 07FF | 110xxxxx 10xxxxxx
 * 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
 * 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
 *
 * Encoding a character to UTF-8 proceeds as follows:
 *
 * 1.  Determine the number of octets required from the character number
 *     and the first column of the table above.  It is important to note
 *     that the rows of the table are mutually exclusive, i.e., there is
 *     only one valid way to encode a given character.
 *
 * 2.  Prepare the high-order bits of the octets as per the second
 *     column of the table.
 *
 * 3.  Fill in the bits marked x from the bits of the character number,
 *     expressed in binary.  Start by putting the lowest-order bit of
 *     the character number in the lowest-order position of the last
 *     octet of the sequence, then put the next higher-order bit of the
 *     character number in the next higher-order position of that octet,
 *     etc.  When the x bits of the last octet are filled in, move on to
 *     the next to last octet, then to the preceding one, etc. until all
 *     x bits are filled in.
 *
 * The definition of UTF-8 prohibits encoding character numbers between
 * U+D800 and U+DFFF,...

#include <locale>
#include <codecvt>

std::string wstringToUtf8(const std::wstring& str)
{
    std::wstring_convert<std::codecvt_utf8<wchar_t> > strCnv;
    return strCnv.to_bytes(str);
}

std::wstring utf8ToWstring(const std::string& str)
{
    std::wstring_convert< std::codecvt_utf8<wchar_t> > strCnv;
    return strCnv.from_bytes(str);
}