Boost:UTF-8 Codecvt Facet(unicode 和 utf-8 之间相互转码)

看到有前辈写了一个UTF-8与UNICODE相互转换的代码,顺便提一下,希望可以给大家提供一点帮助.
下面是一些编码格式的bit长

Examples of fixed-width encoding forms:

TypeEach character
encoded as
Notes
  7-bita single 7-bit quantityexample: ISO 646
  8-bit G0/G1 a single 8-bit quantitywith constraints on use of C0 and C1 spaces
  8-bit a single 8-bit quantity with no constraints on use of C1 space
  8-bit EBCDIC a single 8-bit quantity with the EBCDIC conventions rather than ASCII conventions
16-bit (UCS-2) a single 16-bit quantity within a code space of 0..FFFF
32-bit (UCS-4) a single 32-bit quantity within a code space 0..7FFFFFFF
32-bit (UTF-32) a single 32-bit quantity within a code space of 0..10FFFF
16-bit DBCS process code a single 16-bit quantityexample: UNIX widechar implementations of Asian CCS's
32-bit DBCS process code a single 32-bit quantityexample: UNIX widechar implementations of Asian CCS's
DBCS Host two 8-bit quantitiesfollowing IBM host conventions

Examples of variable-width encoding forms:

NameCharacters are encoded asNotes
UTF-8 a mix of one to four 8-bit code units in Unicode
and one to six code units in 10646
used only with Unicode/10646
UTF-16 a mix of one to two 16 bit code unitsused only with Unicode/10646

Boost中提供了一个UTF-8 Codecvt Facet,可以在utf8和UCS-4(Unicode-32)之间转换.
使用方式如下

  //...
  // My encoding type
  typedef wchar_t ucs4_t;

  std::locale old_locale;
  std::locale utf8_locale(old_locale,new utf8_codecvt_facet<ucs4_t>);

  // Set a New global locale
  std::locale::global(utf8_locale);

  //  UCS-4 转换为 UTF-8
  {
    std::wofstream ofs("data.ucd");
    ofs.imbue(utf8_locale);
    std::copy(ucs4_data.begin(),ucs4_data.end(),
          std::ostream_iterator</ucs4_t>

 
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值