本地化库
本地环境设施包含字符分类和字符串校对、数值、货币及日期/时间格式化和分析,以及消息取得的国际化支持。本地环境设置控制流 I/O 、正则表达式库和 C++ 标准库的其他组件的行为。
平面类别
在字符编码间转换,包括 UTF-8、UTF-16、UTF-32
std::codecvt
template< class InternT, |
类 std::codecvt 封装字符串的转换,包括宽和多字节,从一种编码到另一种。通过 std::basic_fstream<CharT> 进行的所有 I/O 操作都使用流中感染的 std::codecvt<CharT, char, std::mbstate_t> 本地环境平面。
继承图
标准库提供以下独立(本地环境无关)特化:
定义于头文件 | |
std::codecvt<char, char, std::mbstate_t> | 恒等转换 |
std::codecvt<char16_t, char, std::mbstate_t> | 在 UTF-16 和 UTF-8 间转换 (C++11 起)(C++20 中弃用) |
std::codecvt<char16_t, char8_t, std::mbstate_t> | 在 UTF-16 和 UTF-8 间转换 (C++20 起) |
std::codecvt<char32_t, char, std::mbstate_t> | 在 UTF-32 和 UTF-8 间转换 (C++11 起)(C++20 中弃用) |
std::codecvt<char32_t, char8_t, std::mbstate_t> | 在 UTF-32 和 UTF-8 间转换 (C++20 起) |
std::codecvt<wchar_t, char, std::mbstate_t> | 在系统原生宽和单字节窄字符集间转换 |
另外, C++ 程序中构造每个的 locale 对象实现其自身的四个特化的( locale 限定)版本。
成员类型
成员类型 | 定义 |
intern_type | InternT |
extern_type | ExternT |
state_type | State |
调用 do_max_length & 计算转换成给定的 internT 缓冲区会消耗的 externT 字符串长度
std::codecvt<InternT,ExternT,State>::length,
std::codecvt<InternT,ExternT,State>::do_length
public: int length( StateT& state, | (1) | |
protected: virtual int do_length( StateT& state, | (2) |
1) 公开成员函数,调用最终导出类的成员函数 do_length
。
2) 给定初始转换状态 state
,试图转换来自 [from, from_end)
所定义的字符数组的 externT
字符,为至多 max
个 internT
字符,并返回这种转换会消耗的 externT
字符数。如同以对某虚构的 [to, to+max)
输出缓冲区执行 do_in(state, from, from_end, from, to, to+max, to) 一般修改 state
。
返回值
假如以 do_in() 转换直至消耗所有 from_end-from 个字符,或产生 max
个 internT
字符,或出现转换错误,则会消耗的 externT
字符数。
非转换特化 std::codecvt<char, char, std::mbstate_t> 返回 std::min(max, from_end-from) 。
调用示例 linux
#include <locale>
#include <string>
#include <iostream>
int main()
{
// 窄多字节编码
std::string s = "z\u00df\u6c34\U0001d10b";
std::mbstate_t mb = std::mbstate_t();
std::cout << "Only the first " <<
std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(
std::locale("en_US.utf8")
).length(mb, &s[0], &s[s.size()], 2)
<< " bytes out of " << s.size() << " would be consumed "
" to produce the first 2 characters" << std::endl;
return 0;
}
输出
Only the first 3 bytes out of 10 would be consumed to produce the first 2 characters
调用示例 windows
#include <locale>
#include <iostream>
#include <vector>
#include <Windows.h>
#include <string>
std::vector<std::wstring> locals;
BOOL CALLBACK MyFuncLocaleEx(LPWSTR pStr, DWORD dwFlags, LPARAM lparam)
{
locals.push_back(pStr);
return TRUE;
}
std::string stows(const std::wstring& ws)
{
std::string curLocale = setlocale(LC_ALL, NULL); // curLocale = "C";
setlocale(LC_ALL, "chs");
const wchar_t* _Source = ws.c_str();
size_t _Dsize = 2 * ws.size() + 1;
char *_Dest = new char[_Dsize];
memset(_Dest, 0, _Dsize);
wcstombs(_Dest, _Source, _Dsize);
std::string result = _Dest;
delete[]_Dest;
setlocale(LC_ALL, curLocale.c_str());
return result;
}
int main()
{
EnumSystemLocalesEx(MyFuncLocaleEx, LOCALE_ALTERNATE_SORTS, NULL, NULL);
for (std::vector<std::wstring>::const_iterator str = locals.begin();
str != locals.end(); ++str)
{
std::string str1 = "z\u00df\u6c34\U0001d10b";
std::mbstate_t mbstate = std::mbstate_t();
std::wcout << *str ;
std::cout << " Only the first " <<
std::use_facet<std::codecvt<wchar_t, char, std::mbstate_t>>(
std::locale(stows(*str))
).length(mbstate, &str1[0], &str1[str1.size()], 2)
<< " bytes out of " << str1.size() << " would be consumed "
" to produce the first 2 characters" << std::endl;
}
return 0;
}
输出
de-DE_phoneb Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
es-ES_tradnl Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
hu-HU_technl Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
ja-JP_radstr Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
ka-GE_modern Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
x-IV_mathan Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
zh-CN_phoneb Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
zh-CN_stroke Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
zh-HK_radstr Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
zh-MO_radstr Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
zh-MO_stroke Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
zh-SG_phoneb Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
zh-SG_stroke Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
zh-TW_pronun Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters
zh-TW_radstr Only the first 2 bytes out of 6 would be consumed to produce the first 2 characters