/*************************************************************/
/* RFC 3629 defines the mapping as follows :
*
* Char. number range | UTF-8 octet sequence
* (hexadecimal) | (binary)
* --------------------+---------------------------------------------
* 0000 0000-0000 007F | 0xxxxxxx
* 0000 0080-0000 07FF | 110xxxxx 10xxxxxx
* 0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
* 0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
*
* Encoding a character to UTF-8 proceeds as follows:
*
* 1. Determine the number of octets required from the character number
* and the first column of the table above. It is important to note
* that the rows of the table are mutually exclusive, i.e., there is
* only one valid way to encode a given character.
*
* 2. Prepare the high-order bits of the octets as per the second
* column of the table.
*
* 3. Fill in the bits marked x from the bits of the character number,
* expressed in binary. Start by putting the lowest-order bit of
* the character number in the lowest-order position of the last
* octet of the sequence, then put the next higher-order bit of the
* character number in the next higher-order position of that octet,
* etc. When the x bits of the last octet are filled in, move on to
* the next to last octet, then to the preceding one, etc. until all
* x bits are filled in.
*
* The definition of UTF-8 prohibits encoding character numbers between
* U+D800 and U+DFFF,...
#include <locale>
#include <codecvt>
std::string wstringToUtf8(const std::wstring& str)
{
std::wstring_convert<std::codecvt_utf8<wchar_t> > strCnv;
return strCnv.to_bytes(str);
}
std::wstring utf8ToWstring(const std::string& str)
{
std::wstring_convert< std::codecvt_utf8<wchar_t> > strCnv;
return strCnv.from_bytes(str);
}
C++11 wstring与utf-8 转换
最新推荐文章于 2024-09-26 09:28:32 发布