字符集编码(GBK,BIG5,UNICODE等)与C++的string/wstring .

一 预备知识

<span style="font-family:'Times New Roman';FONT-SIZE: 12pt">1<span style="font-family:宋体;">,字符</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">:字符是抽象的最小文本单位。它没有固定的形状(可能是一个字形),而且没有值。“A”<span style="font-family:宋体;">是一个字符,</span>“€”<span style="font-family:宋体;">(德国、法国和许多其他欧洲国家通用货币的标志)也是一个字符。</span>“<span style="font-family:宋体;">中</span>”“<span style="font-family:宋体;">国</span>”<span style="font-family:宋体;">这是两个汉字字符。字符仅仅代表一个符号,没有任何实际值的意义。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">2<span style="font-family:宋体;">,字符集:字符集是字符的集合。例如,汉字字符是中国人最先发明的字符,在中文、日文、韩文和越南文的书写中使用。这也说明了字符和字符集之间的关系,字符组成字符集(</span>iso8859-1<span style="font-family:宋体;">,</span>GB2312/GBK<span style="font-family:宋体;">,</span>unicode<span style="font-family:宋体;">)。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">3<span style="font-family:宋体;">,代码点:字符集中的每个字符都被分配到一个</span>“<span style="font-family:宋体;">代码点</span>”<span style="font-family:宋体;">。每个代码点都有一个特定的唯一数值,称为标值。该标量值通常用十六进制表示。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">4<span style="font-family:宋体;">,代码单元: 在每种编码形式中,代码点被映射到一个或多个代码单元。</span>“<span style="font-family:宋体;">代码单元</span>”<span style="font-family:宋体;">是各个编码方式中的单个单元。代码单元的大小等效于特定编码方式的位数:</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-8 <span style="font-family:宋体;">:</span>UTF-8 <span style="font-family:宋体;">中的代码单元由 </span>8 <span style="font-family:宋体;">位组成;在 </span>UTF-8 <span style="font-family:宋体;">中,因为代码单元较小的缘故,每个代码点常常被映射到多个代码单元。代码点将被映射到一个、两个、三个或四个代码单元;</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-16 <span style="font-family:宋体;">:</span>UTF-16 <span style="font-family:宋体;">中的代码单元由 </span>16 <span style="font-family:宋体;">位组成;</span>UTF-16 <span style="font-family:宋体;">的代码单元大小是 </span>8 <span style="font-family:宋体;">位代码单元的两倍。所以,标量值小于 </span>U+10000 <span style="font-family:宋体;">的代码点被编码到单个代码单元中;</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-32<span style="font-family:宋体;">:</span>UTF-32  <span style="font-family:宋体;">中的代码单元由 </span>32 <span style="font-family:宋体;">位组成; </span>UTF-32 <span style="font-family:宋体;">中使用的 </span>32 <span style="font-family:宋体;">位代码单元足够大,每个代码点都可编码为单个代码单元;</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">GB18030<span style="font-family:宋体;">:</span>GB18030  <span style="font-family:宋体;">中的代码单元由 </span>8 <span style="font-family:宋体;">位组成;在 </span>GB18030 <span style="font-family:宋体;">中,因为代码单元较小的缘故,每个代码点常常被映射到多个代码单元。代码点将被映射到一个、两个或四个代码单元。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">5<span style="font-family:宋体;">,举例:</span></span>
<span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">“<span style="font-family:宋体;">中国北京香蕉是个大笨蛋</span>”<span style="font-family:宋体;">这是我定义的</span>aka<span style="font-family:宋体;">字符集;各字符对应代码点为:</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">北 00000001</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">京 00000010</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">香 10000001</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">蕉 10000010</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">是 10000100</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">个 10001000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">大 10010000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">笨 10100000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">蛋 11000000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">中 00000100</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">国 00001000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">下面是我定义的 zixia <span style="font-family:宋体;">编码方案(</span>8<span style="font-family:宋体;">位),可以看到它的编码中表示了</span>aka<span style="font-family:宋体;">字符集的所有字符对应的 代码单元;</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">北 10000001</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">京 10000010</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">香 00000001</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">蕉 00000010</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">是 00000100</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">个 00001000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">大 00010000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">笨 00100000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">蛋 01000000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">中 10000100</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">国 10001000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">所谓文本文件 就是我们按一定编码方式将二进制数据表示为对应的文本如 00000001000000100000010000001000000100000010000001000000<span style="font-family:宋体;">这样的文件。我用一个支持 </span>zixia<span style="font-family:宋体;">编码和</span>aka<span style="font-family:宋体;">字符集的记事本打开,它就按照编码方案显示为  </span>“<span style="font-family:宋体;">香蕉是个大笨蛋 </span>”</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">如果我把这些字符按照GBK<span style="font-family:宋体;">另存一个文件,那么则肯定不是这个,而是</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">1100111111100011 1011110110110110 1100101011000111 1011100011110110 1011010011110011 1011000110111111 1011010110110000 110100001010</span>

二,字符集

<span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">1<span style="font-family:宋体;">, 常用字符集分类</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';color:#ff00;FONT-SIZE: 9.5pt">ASCII</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">及其扩展字符集</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">作用:表语英语及西欧语言。</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">位数:ASCII<span style="font-family:宋体;">是用</span>7<span style="font-family:宋体;">位表示的,能表示</span>128<span style="font-family:宋体;">个字符;其扩展使用</span>8<span style="font-family:宋体;">位表示,表示</span>256<span style="font-family:宋体;">个字符。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">范围:ASCII<span style="font-family:宋体;">从</span>00<span style="font-family:宋体;">到</span>7F<span style="font-family:宋体;">,扩展从</span>00<span style="font-family:宋体;">到</span>FF<span style="font-family:宋体;">。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';color:#ff00;FONT-SIZE: 9.5pt">ISO-8859-1</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">字符集</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">作用:扩展ASCII<span style="font-family:宋体;">,表示西欧、希腊语等。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">位数:8<span style="font-family:宋体;">位,</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">范围:从00<span style="font-family:宋体;">到</span>FF<span style="font-family:宋体;">,兼容</span>ASCII<span style="font-family:宋体;">字符集。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';color:#ff00;FONT-SIZE: 9.5pt">GB2312</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">字符集</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">作用:国家简体中文字符集,兼容ASCII<span style="font-family:宋体;">。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">位数:使用2<span style="font-family:宋体;">个字节表示,能表示</span>7445<span style="font-family:宋体;">个符号,包括</span>6763<span style="font-family:宋体;">个汉字,几乎覆盖所有高频率汉字。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">范围:高字节从A1<span style="font-family:宋体;">到</span>F7, <span style="font-family:宋体;">低字节从</span>A1<span style="font-family:宋体;">到</span>FE<span style="font-family:宋体;">。将高字节和低字节分别加上</span>0XA0<span style="font-family:宋体;">即可得到编码。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';color:#ff00;FONT-SIZE: 9.5pt">BIG5</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">字符集</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">作用:统一繁体字编码。</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">位数:使用2<span style="font-family:宋体;">个字节表示,表示</span>13053<span style="font-family:宋体;">个汉字。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">范围:高字节从A1<span style="font-family:宋体;">到</span>F9<span style="font-family:宋体;">,低字节从</span>40<span style="font-family:宋体;">到</span>7E<span style="font-family:宋体;">,</span>A1<span style="font-family:宋体;">到</span>FE<span style="font-family:宋体;">。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';color:#ff00;FONT-SIZE: 9.5pt">GBK</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">字符集</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">作用:它是GB2312<span style="font-family:宋体;">的扩展,加入对繁体字的支持,兼容</span>GB2312<span style="font-family:宋体;">。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">位数:使用2<span style="font-family:宋体;">个字节表示,可表示</span>21886<span style="font-family:宋体;">个字符。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">范围:高字节从81<span style="font-family:宋体;">到</span>FE<span style="font-family:宋体;">,低字节从</span>40<span style="font-family:宋体;">到</span>FE<span style="font-family:宋体;">。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';color:#ff00;FONT-SIZE: 9.5pt">GB18030</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">字符集</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">作用:它解决了中文、日文、朝鲜语等的编码,兼容GBK<span style="font-family:宋体;">。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">位数:它采用变字节表示(1 ASCII<span style="font-family:宋体;">,</span>2<span style="font-family:宋体;">,</span>4<span style="font-family:宋体;">字节</span>)<span style="font-family:宋体;">。可表示</span>27484<span style="font-family:宋体;">个文字。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">范围:1<span style="font-family:宋体;">字节从</span>00<span style="font-family:宋体;">到</span>7F; 2<span style="font-family:宋体;">字节高字节从</span>81<span style="font-family:宋体;">到</span>FE<span style="font-family:宋体;">,低字节从</span>40<span style="font-family:宋体;">到</span>7E<span style="font-family:宋体;">和</span>80<span style="font-family:宋体;">到</span>FE<span style="font-family:宋体;">;</span>4<span style="font-family:宋体;">字节第一三字节从</span>81<span style="font-family:宋体;">到</span>FE<span style="font-family:宋体;">,第二四字节从</span>30<span style="font-family:宋体;">到</span>39<span style="font-family:宋体;">。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';color:#ff00;FONT-SIZE: 9.5pt">UCS</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">字符集</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">作用:国际标准 ISO 10646 <span style="font-family:宋体;">定义了通用字符集 </span>(Universal Character Set)<span style="font-family:宋体;">。它是与</span>UNICODE<span style="font-family:宋体;">同类的组织,</span>UCS-2<span style="font-family:宋体;">和</span>UNICODE<span style="font-family:宋体;">兼容。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">位数:它有UCS-2<span style="font-family:宋体;">和</span>UCS-4<span style="font-family:宋体;">两种格式,分别是</span>2<span style="font-family:宋体;">字节和</span>4<span style="font-family:宋体;">字节。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">范围:目前,UCS-4<span style="font-family:宋体;">只是在</span>UCS-2<span style="font-family:宋体;">前面加了</span>0×0000<span style="font-family:宋体;">。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';color:#ff00;FONT-SIZE: 9.5pt">UNICODE</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">字符集</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">作用:为世界650<span style="font-family:宋体;">种语言进行统一编码,兼容</span>ISO-8859-1<span style="font-family:宋体;">。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">位数:UNICODE<span style="font-family:宋体;">字符集有多个编码方式,分别是</span>UTF-8<span style="font-family:宋体;">,</span>UTF-16<span style="font-family:宋体;">和</span>UTF-32<span style="font-family:宋体;">。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">2 <span style="font-family:宋体;">,按所表示的文字分类</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">语言                                 字符集                                     正式名称</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">英语、西欧语                     ASCII<span style="font-family:宋体;">,</span>ISO-8859-1                MBCS <span style="font-family:宋体;">多字节</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">简体中文                             GB2312                                    MBCS <span style="font-family:宋体;">多字节</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">繁体中文                             BIG5                                         MBCS <span style="font-family:宋体;">多字节</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">简繁中文                             GBK                                         MBCS <span style="font-family:宋体;">多字节</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">中文、日文及朝鲜语         GB18030                                  MBCS <span style="font-family:宋体;">多字节</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">各国语言                             UNICODE<span style="font-family:宋体;">,</span>UCS                    DBCS <span style="font-family:宋体;">宽字节</span></span>

三,编码

<span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-8<span style="font-family:宋体;">:采用变长字节 </span>(1 ASCII, 2 <span style="font-family:宋体;">希腊字母</span>, 3 <span style="font-family:宋体;">汉字</span>, 4 <span style="font-family:宋体;">平面符号</span>) <span style="font-family:宋体;">表示,网络传输</span>, <span style="font-family:宋体;">即使错了一个字节,不影响其他字节,而双字节只要一个错了,其他也错了,具体如下:</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">如果只有一个字节则其最高二进制位为0<span style="font-family:宋体;">;如果是多字节,其第一个字节从最高位开始,连续的二进制位值为</span>1<span style="font-family:宋体;">的个数决定了其编码的字节数,其余各字节均以</span>10<span style="font-family:宋体;">开头。</span>UTF-8<span style="font-family:宋体;">最多可用到</span>6<span style="font-family:宋体;">个字节。</span></span>
<span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-16<span style="font-family:宋体;">:采用</span>2<span style="font-family:宋体;">字节,</span>Unicode<span style="font-family:宋体;">中不同部分的字符都同样基于现有的标准。这是为了便于转换。从 </span>0×0000<span style="font-family:宋体;">到</span>0×007F<span style="font-family:宋体;">是</span>ASCII<span style="font-family:宋体;">字符,从</span>0×0080<span style="font-family:宋体;">到</span>0×00FF<span style="font-family:宋体;">是</span>ISO-8859-1<span style="font-family:宋体;">对</span>ASCII<span style="font-family:宋体;">的扩展。希腊字母表使用从</span>0×0370<span style="font-family:宋体;">到 </span>0×03FF <span style="font-family:宋体;">的代码,斯拉夫语使用从</span>0×0400<span style="font-family:宋体;">到</span>0×04FF<span style="font-family:宋体;">的代码,美国使用从</span>0×0530<span style="font-family:宋体;">到</span>0×058F<span style="font-family:宋体;">的代码,希伯来语使用从</span>0×0590<span style="font-family:宋体;">到</span>0×05FF<span style="font-family:宋体;">的代 码。中国、日本和韩国的象形文字(总称为</span>CJK<span style="font-family:宋体;">)占用了从</span>0×3000<span style="font-family:宋体;">到</span>0×9FFF<span style="font-family:宋体;">的代码;由于</span>0×00<span style="font-family:宋体;">在</span>c<span style="font-family:宋体;">语言及操作系统文件名等中有特殊意义,故很 多情况下需要</span>UTF-8<span style="font-family:宋体;">编码保存文本,去掉这个</span>0×00<span style="font-family:宋体;">。举例如下:</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-16: 0×0080  = 0000 0000 1000 0000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-8:   0xC280 = 1100 0010 1000 0000</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-32<span style="font-family:宋体;">:采用</span>4<span style="font-family:宋体;">字节。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">优缺点</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-8<span style="font-family:宋体;">、</span>UTF-16<span style="font-family:宋体;">和</span>UTF-32<span style="font-family:宋体;">都可以表示有效编码空间 </span>(U+000000-U+10FFFF) <span style="font-family:宋体;">内的所有</span>Unicode<span style="font-family:宋体;">字符。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">使用UTF-8<span style="font-family:宋体;">编码时</span>ASCII<span style="font-family:宋体;">字符只占</span>1<span style="font-family:宋体;">个字节,存储效率比较高,适用于拉丁字符较多的场合以节省空间。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">对于大多数非拉丁字符(如中文和日文)来说,UTF-16<span style="font-family:宋体;">所需存储空间最小,每个字符只占</span>2<span style="font-family:宋体;">个字节。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">Windows NT<span style="font-family:宋体;">内核是</span>Unicode<span style="font-family:宋体;">(</span>UTF-16<span style="font-family:宋体;">),采用</span>UTF-16<span style="font-family:宋体;">编码在调用系统</span>API<span style="font-family:宋体;">时无需转换,处理速度也比较快。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">采用UTF-16<span style="font-family:宋体;">和</span>UTF-32<span style="font-family:宋体;">会有</span>Big Endian<span style="font-family:宋体;">和</span>Little Endian<span style="font-family:宋体;">之分,而</span>UTF-8<span style="font-family:宋体;">则没有字节顺序问题,所以</span>UTF-8<span style="font-family:宋体;">适合传输和通信。</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-32<span style="font-family:宋体;">采用</span>4<span style="font-family:宋体;">字节编码,一方面处理速度比较快,但另一方面也浪费了大量空间,影响传输速度,因而很少使用。</span></span>

四,如何判断字符集

<span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">1<span style="font-family:宋体;">,字节序</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">首先说一下字节序对编码的影响,字节序分为Big Endian<span style="font-family:宋体;">字节序和</span>Little Endian<span style="font-family:宋体;">字节序。不同的处理器可能不一样。所以,传输时需要告诉处理器当时的编码字节序。对于前者而言,高位字节存在低地址,低字节存于高地址;后者相反。例如,</span>0X03AB,</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">Big Endian<span style="font-family:宋体;">字节序</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">0000: 0 3</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">0001: AB</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">Little Endian<span style="font-family:宋体;">字节序是</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">0000: AB</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">0001: 0 3</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">2<span style="font-family:宋体;">,编码识别</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UNICODE<span style="font-family:宋体;">,根据前几个字节可以判断</span>UNICODE<span style="font-family:宋体;">字符集的各种编码,叫做</span>Byte Order Mask<span style="font-family:宋体;">方法</span>BOM<span style="font-family:宋体;">:</span></span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-8: EFBBBF (<span style="font-family:宋体;">符合</span>UTF-8<span style="font-family:宋体;">格式,请看上面。但没有含义在</span>UCS<span style="font-family:宋体;">即</span>UNICODE<span style="font-family:宋体;">中</span>)</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-16 Big Endian<span style="font-family:宋体;">:</span>FEFF (<span style="font-family:宋体;">没有含义在</span>UCS-2<span style="font-family:宋体;">中</span>)</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-16 Little Endian<span style="font-family:宋体;">:</span>FFFE (<span style="font-family:宋体;">没有含义在</span>UCS-2<span style="font-family:宋体;">中</span>)</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-32 Big Endian<span style="font-family:宋体;">:</span>0000FEFF (<span style="font-family:宋体;">没有含义在</span>UCS-4<span style="font-family:宋体;">中</span>)</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">
</span><span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">UTF-32 Little Endian<span style="font-family:宋体;">:</span>FFFE0000 (<span style="font-family:宋体;">没有含义在</span>UCS-4<span style="font-family:宋体;">中</span>)</span>
<span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">GB2312<span style="font-family:宋体;">:高字节和低字节的第</span>1<span style="font-family:宋体;">位都是</span>1<span style="font-family:宋体;">。</span></span>
<span style="font-family:'Times New Roman';FONT-SIZE: 9.5pt">BIG5<span style="font-family:宋体;">,</span>GBK&GB18030<span style="font-family:宋体;">:高字节的第</span>1<span style="font-family:宋体;">位为</span>1<span style="font-family:宋体;">。操作系统有默认的编码,常为</span>GBK<span style="font-family:宋体;">,可以下载别的并升级。通过判断高字节的第</span>1<span style="font-family:宋体;">位从而知道是</span>ASCII<span style="font-family:宋体;">或者汉字编码。</span></span>
</pre><pre style="TEXT-ALIGN: left; LINE-HEIGHT: 25px; BACKGROUND-COLOR: rgb(255,255,255); MARGIN-TOP: 0px; WORD-WRAP: break-word; WHITE-SPACE: pre-wrap; MARGIN-BOTTOM: 0px; COLOR: rgb(51,51,51); FONT-SIZE: 14px" class="p0" name="code"><span style="font-family:宋体;FONT-SIZE: 10.5pt">摘自: </span><a target=_blank style="OUTLINE-STYLE: none; COLOR: rgb(61,129,238); TEXT-DECORATION: none" href="http://blog.minidx.com/2008/12/06/1689.html"><span style="font-family:'Times New Roman';color:#00ff;FONT-SIZE: 10.5pt"><u>http://blog.minidx.com/2008/12/06/1689.html</u></span></a><span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt"> </span>
</pre><h1 style="TEXT-ALIGN: left; PADDING-BOTTOM: 8px; BACKGROUND-COLOR: rgb(153,153,153); MARGIN-TOP: 0pt; PADDING-LEFT: 10px; PADDING-RIGHT: 10px; FONT-FAMILY: 微软雅黑, sans-serif; MARGIN-BOTTOM: 0pt; COLOR: rgb(255,255,255); FONT-SIZE: 26px; FONT-WEIGHT: normal; PADDING-TOP: 8px"><a target=_blank name="t5"></a><span style="font-family:Arial;FONT-SIZE: 16pt"><strong>字符编码笔记:ASCII<span style="font-family:黑体;">,</span>Unicode<span style="font-family:黑体;">和</span>UTF-8</strong></span></h1><pre style="TEXT-ALIGN: left; LINE-HEIGHT: 25px; BACKGROUND-COLOR: rgb(255,255,255); MARGIN-TOP: 0px; WORD-WRAP: break-word; WHITE-SPACE: pre-wrap; MARGIN-BOTTOM: 0px; COLOR: rgb(51,51,51); FONT-SIZE: 14px" class="p0" name="code"><span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt"> </span><span style="font-family:宋体;FONT-SIZE: 10.5pt">摘自:</span><a target=_blank style="OUTLINE-STYLE: none; COLOR: rgb(61,129,238); TEXT-DECORATION: none" href="http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html"><span style="font-family:'Times New Roman';color:#00ff;FONT-SIZE: 10.5pt"><u>http://www.ruanyifeng.com/blog/2007/10/ascii_unicode_and_utf-8.html</u></span></a><span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt"> </span><span style="font-family:宋体;FONT-SIZE: 10.5pt"> </span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">今天中午,我突然想搞清楚Unicode<span style="font-family:宋体;">和</span>UTF-8<span style="font-family:宋体;">之间的关系,于是就开始在网上查资料。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">结果,这个问题比我想象的复杂,从午饭后一直看到晚上9<span style="font-family:宋体;">点,才算初步搞清楚。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">下面就是我的笔记,主要用来整理自己的思路。但是,我尽量试图写得通俗易懂,希望能对其他朋友有用。毕竟,字符编码是计算机技术的基石,想要熟练使用计算机,就必须懂得一点字符编码的知识。</span>

1. ASCII

<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">我们知道,在计算机内部,所有的信息最终都表示为一个二进制的字符串。每一个二进制位(bit<span style="font-family:宋体;">)有</span>0<span style="font-family:宋体;">和</span>1<span style="font-family:宋体;">两种状态,因此八个二进制位就可以组合出</span>256<span style="font-family:宋体;">种状态,这被称为一个字节(</span>byte<span style="font-family:宋体;">)。也就是说,一个字节一共可以用来表示</span>256<span style="font-family:宋体;">种不同的状态,每一个状态对应一个符号,就是</span>256<span style="font-family:宋体;">个符号,从</span>0000000<span style="font-family:宋体;">到</span>11111111<span style="font-family:宋体;">。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">上个世纪60<span style="font-family:宋体;">年代,美国制定了一套字符编码,对英语字符与二进制位之间的关系,做了统一规定。这被称为</span>ASCII<span style="font-family:宋体;">码,一直沿用至今。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">ASCII<span style="font-family:宋体;">码一共规定了</span>128<span style="font-family:宋体;">个字符的编码,比如空格</span>“SPACE”<span style="font-family:宋体;">是</span>32<span style="font-family:宋体;">(二进制</span>00100000<span style="font-family:宋体;">),大写的字母</span>A<span style="font-family:宋体;">是</span>65<span style="font-family:宋体;">(二进制</span>01000001<span style="font-family:宋体;">)。这</span>128<span style="font-family:宋体;">个符号(包括</span>32<span style="font-family:宋体;">个不能打印出来的控制符号),只占用了一个字节的后面</span>7<span style="font-family:宋体;">位,最前面的</span>1<span style="font-family:宋体;">位统一规定为</span>0<span style="font-family:宋体;">。</span></span>

2、非ASCII编码

<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">英语用128<span style="font-family:宋体;">个符号编码就够了,但是用来表示其他语言,</span>128<span style="font-family:宋体;">个符号是不够的。比如,在法语中,字母上方有注音符号,它就无法用</span>ASCII<span style="font-family:宋体;">码表示。于是,一些欧洲国家就决定,利用字节中闲置的最高位编入新的符号。比如,法语中的</span>é<span style="font-family:宋体;">的编码为</span>130<span style="font-family:宋体;">(二进制</span>10000010<span style="font-family:宋体;">)。这样一来,这些欧洲国家使用的编码体系,可以表示最多</span>256<span style="font-family:宋体;">个符号。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">但是,这里又出现了新的问题。不同的国家有不同的字母,因此,哪怕它们都使用256<span style="font-family:宋体;">个符号的编码方式,代表的字母却不一样。比如,</span>130<span style="font-family:宋体;">在法语编码中代表了</span>é<span style="font-family:宋体;">,在希伯来语编码中却代表了字母</span>Gimel (ג)<span style="font-family:宋体;">,在俄语编码中又会代表另一个符号。但是不管怎样,所有这些编码方式中,</span>0—127<span style="font-family:宋体;">表示的符号是一样的,不一样的只是</span>128—255<span style="font-family:宋体;">的这一段。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">至于亚洲国家的文字,使用的符号就更多了,汉字就多达10<span style="font-family:宋体;">万左右。一个字节只能表示</span>256<span style="font-family:宋体;">种符号,肯定是不够的,就必须使用多个字节表达一个符号。比如,简体中文常见的编码方式是</span>GB2312<span style="font-family:宋体;">,使用两个字节表示一个汉字,所以理论上最多可以表示</span>256x256=65536<span style="font-family:宋体;">个符号。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">中文编码的问题需要专文讨论,这篇笔记不涉及。这里只指出,虽然都是用多个字节表示一个符号,但是GB<span style="font-family:宋体;">类的汉字编码与后文的</span>Unicode<span style="font-family:宋体;">和</span>UTF-8<span style="font-family:宋体;">是毫无关系的。</span></span>

3.Unicode

<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">正如上一节所说,世界上存在着多种编码方式,同一个二进制数字可以被解释成不同的符号。因此,要想打开一个文本文件,就必须知道它的编码方式,否则用错误的编码方式解读,就会出现乱码。为什么电子邮件常常出现乱码?就是因为发信人和收信人使用的编码方式不一样。</span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">可以想象,如果有一种编码,将世界上所有的符号都纳入其中。每一个符号都给予一个独一无二的编码,那么乱码问题就会消失。这就是Unicode<span style="font-family:宋体;">,就像它的名字都表示的,这是一种所有符号的编码。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">Unicode<span style="font-family:宋体;">当然是一个很大的集合,现在的规模可以容纳</span>100<span style="font-family:宋体;">多万个符号。每个符号的编码都不一样,比如,</span>U+0639<span style="font-family:宋体;">表示阿拉伯字母</span>Ain<span style="font-family:宋体;">,</span>U+0041<span style="font-family:宋体;">表示英语的大写字母</span>A<span style="font-family:宋体;">,</span>U+4E25<span style="font-family:宋体;">表示汉字</span>“<span style="font-family:宋体;">严</span>”<span style="font-family:宋体;">。具体的符号对应表,可以查询</span></span><a target=_blank style="OUTLINE-STYLE: none; COLOR: rgb(61,129,238); TEXT-DECORATION: none" href="http://www.unicode.org/"><span style="font-family:'Times New Roman';color:#00ff;FONT-SIZE: 16.5pt"><u>unicode.org</u></span></a><span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">,或者专门的</span><a target=_blank style="OUTLINE-STYLE: none; COLOR: rgb(61,129,238); TEXT-DECORATION: none" href="http://www.chi2ko.com/tool/CJK.htm"><span style="font-family:'Times New Roman';color:#00ff;FONT-SIZE: 16.5pt"><u>汉字对应表</u></span></a><span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">。</span>

4. Unicode的问题

<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">需要注意的是,Unicode<span style="font-family:宋体;">只是一个符号集,它只规定了符号的二进制代码,却没有规定这个二进制代码应该如何存储。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">比如,汉字“<span style="font-family:宋体;">严</span>”<span style="font-family:宋体;">的</span>unicode<span style="font-family:宋体;">是十六进制数</span>4E25<span style="font-family:宋体;">,转换成二进制数足足有</span>15<span style="font-family:宋体;">位(</span>100111000100101<span style="font-family:宋体;">),也就是说这个符号的表示至少需要</span>2<span style="font-family:宋体;">个字节。表示其他更大的符号,可能需要</span>3<span style="font-family:宋体;">个字节或者</span>4<span style="font-family:宋体;">个字节,甚至更多。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">这里就有两个严重的问题,第一个问题是,如何才能区别unicode<span style="font-family:宋体;">和</span>ascii<span style="font-family:宋体;">?计算机怎么知道三个字节表示一个符号,而不是分别表示三个符号呢?第二个问题是,我们已经知道,英文字母只用一个字节表示就够了,如果</span>unicode<span style="font-family:宋体;">统一规定,每个符号用三个或四个字节表示,那么每个英文字母前都必然有二到三个字节是</span>0<span style="font-family:宋体;">,这对于存储来说是极大的浪费,文本文件的大小会因此大出二三倍,这是无法接受的。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">它们造成的结果是:1<span style="font-family:宋体;">)出现了</span>unicode<span style="font-family:宋体;">的多种存储方式,也就是说有许多种不同的二进制格式,可以用来表示</span>unicode<span style="font-family:宋体;">。</span>2<span style="font-family:宋体;">)</span>unicode<span style="font-family:宋体;">在很长一段时间内无法推广,直到互联网的出现。</span></span>

5.UTF-8

<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">互联网的普及,强烈要求出现一种统一的编码方式。UTF-8<span style="font-family:宋体;">就是在互联网上使用最广的一种</span>unicode<span style="font-family:宋体;">的实现方式。其他实现方式还包括</span>UTF-16<span style="font-family:宋体;">和</span>UTF-32<span style="font-family:宋体;">,不过在互联网上基本不用。</span></span><span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt"><strong>重复一遍,这里的关系是,UTF-8<span style="font-family:宋体;">是</span>Unicode<span style="font-family:宋体;">的实现方式之一。</span></strong></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">UTF-8<span style="font-family:宋体;">最大的一个特点,就是它是一种变长的编码方式。它可以使用</span>1~4<span style="font-family:宋体;">个字节表示一个符号,根据不同的符号而变化字节长度。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">UTF-8<span style="font-family:宋体;">的编码规则很简单,只有二条:</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">1<span style="font-family:宋体;">)对于单字节的符号,字节的第一位设为</span>0<span style="font-family:宋体;">,后面</span>7<span style="font-family:宋体;">位为这个符号的</span>unicode<span style="font-family:宋体;">码。因此对于英语字母,</span>UTF-8<span style="font-family:宋体;">编码和</span>ASCII<span style="font-family:宋体;">码是相同的。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">2<span style="font-family:宋体;">)对于</span>n<span style="font-family:宋体;">字节的符号(</span>n>1<span style="font-family:宋体;">),第一个字节的前</span>n<span style="font-family:宋体;">位都设为</span>1<span style="font-family:宋体;">,第</span>n+1<span style="font-family:宋体;">位设为</span>0<span style="font-family:宋体;">,后面字节的前两位一律设为</span>10<span style="font-family:宋体;">。剩下的没有提及的二进制位,全部为这个符号的</span>unicode<span style="font-family:宋体;">码。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">下表总结了编码规则,字母x<span style="font-family:宋体;">表示可用编码的位。</span></span>
<span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">Unicode<span style="font-family:宋体;">符号范围 </span>| UTF-8<span style="font-family:宋体;">编码方式</span></span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">
</span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">(<span style="font-family:宋体;">十六进制</span>) | <span style="font-family:宋体;">(二进制)</span></span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">
</span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">--------------------+---------------------------------------------</span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">
</span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">0000 0000-0000 007F | 0xxxxxxx</span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">
</span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">0000 0080-0000 07FF | 110xxxxx 10xxxxxx</span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">
</span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx</span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">
</span><span style="font-family:'Times New Roman';color:#111111;BACKGROUND-COLOR: rgb(224,223,204); FONT-SIZE: 16.5pt">0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx</span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">下面,还是以汉字“<span style="font-family:宋体;">严</span>”<span style="font-family:宋体;">为例,演示如何实现</span>UTF-8<span style="font-family:宋体;">编码。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">已知“<span style="font-family:宋体;">严</span>”<span style="font-family:宋体;">的</span>unicode<span style="font-family:宋体;">是</span>4E25<span style="font-family:宋体;">(</span>100111000100101<span style="font-family:宋体;">),根据上表,可以发现</span>4E25<span style="font-family:宋体;">处在第三行的范围内(</span>0000 0800-0000 FFFF<span style="font-family:宋体;">),因此</span>“<span style="font-family:宋体;">严</span>”<span style="font-family:宋体;">的</span>UTF-8<span style="font-family:宋体;">编码需要三个字节,即格式是</span>“1110xxxx 10xxxxxx 10xxxxxx”<span style="font-family:宋体;">。然后,从</span>“<span style="font-family:宋体;">严</span>”<span style="font-family:宋体;">的最后一个二进制位开始,依次从后向前填入格式中的</span>x<span style="font-family:宋体;">,多出的位补</span>0<span style="font-family:宋体;">。这样就得到了,</span>“<span style="font-family:宋体;">严</span>”<span style="font-family:宋体;">的</span>UTF-8<span style="font-family:宋体;">编码是</span>“11100100 10111000 10100101”<span style="font-family:宋体;">,转换成十六进制就是</span>E4B8A5<span style="font-family:宋体;">。</span></span>

6. UnicodeUTF-8之间的转换

<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">通过上一节的例子,可以看到“<span style="font-family:宋体;">严</span>”<span style="font-family:宋体;">的</span>Unicode<span style="font-family:宋体;">码是</span>4E25<span style="font-family:宋体;">,</span>UTF-8<span style="font-family:宋体;">编码是</span>E4B8A5<span style="font-family:宋体;">,两者是不一样的。它们之间的转换可以通过程序实现。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">在Windows<span style="font-family:宋体;">平台下,有一个最简单的转化方法,就是使用内置的记事本小程序</span>Notepad.exe<span style="font-family:宋体;">。打开文件后,点击</span>“<span style="font-family:宋体;">文件</span>”<span style="font-family:宋体;">菜单中的</span>“<span style="font-family:宋体;">另存为</span>”<span style="font-family:宋体;">命令,会跳出一个对话框,在最底部有一个</span>“<span style="font-family:宋体;">编码</span>”<span style="font-family:宋体;">的下拉条。</span></span>
<a target=_blank style="OUTLINE-STYLE: none; COLOR: rgb(61,129,238); TEXT-DECORATION: none" href="http://www.ruanyifeng.com/blog/2007/10/bg2007102801.jpg"><img style="BORDER-BOTTOM: rgb(221,221,221) 0px; BORDER-LEFT: rgb(221,221,221) 0px; BORDER-TOP: rgb(221,221,221) 0px; BORDER-RIGHT: rgb(221,221,221) 0px" alt="" src="file:///C:/Users/NOLANC~1/AppData/Local/Temp/ksohtml/wps_clip_image-5633.png" width="501" height="228" /></a>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">里面有四个选项:ANSI<span style="font-family:宋体;">,</span>Unicode<span style="font-family:宋体;">,</span>Unicode big endian <span style="font-family:宋体;">和 </span>UTF-8<span style="font-family:宋体;">。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">1<span style="font-family:宋体;">)</span>ANSI<span style="font-family:宋体;">是默认的编码方式。对于英文文件是</span>ASCII<span style="font-family:宋体;">编码,对于简体中文文件是</span>GB2312<span style="font-family:宋体;">编码(只针对</span>Windows<span style="font-family:宋体;">简体中文版,如果是繁体中文版会采用</span>Big5<span style="font-family:宋体;">码)。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">2<span style="font-family:宋体;">)</span>Unicode<span style="font-family:宋体;">编码指的是</span>UCS-2<span style="font-family:宋体;">编码方式,即直接用两个字节存入字符的</span>Unicode<span style="font-family:宋体;">码。这个选项用的</span>little endian<span style="font-family:宋体;">格式。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">3<span style="font-family:宋体;">)</span>Unicode big endian<span style="font-family:宋体;">编码与上一个选项相对应。我在下一节会解释</span>little endian<span style="font-family:宋体;">和</span>big endian<span style="font-family:宋体;">的涵义。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">4<span style="font-family:宋体;">)</span>UTF-8<span style="font-family:宋体;">编码,也就是上一节谈到的编码方法。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">选择完”<span style="font-family:宋体;">编码方式</span>“<span style="font-family:宋体;">后,点击</span>”<span style="font-family:宋体;">保存</span>“<span style="font-family:宋体;">按钮,文件的编码方式就立刻转换好了。</span></span>

7. Little endianBig endian

<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">上一节已经提到,Unicode<span style="font-family:宋体;">码可以采用</span>UCS-2<span style="font-family:宋体;">格式直接存储。以汉字</span>”<span style="font-family:宋体;">严</span>“<span style="font-family:宋体;">为例,</span>Unicode<span style="font-family:宋体;">码是</span>4E25<span style="font-family:宋体;">,需要用两个字节存储,一个字节是</span>4E<span style="font-family:宋体;">,另一个字节是</span>25<span style="font-family:宋体;">。存储的时候,</span>4E<span style="font-family:宋体;">在前,</span>25<span style="font-family:宋体;">在后,就是</span>Big endian<span style="font-family:宋体;">方式;</span>25<span style="font-family:宋体;">在前,</span>4E<span style="font-family:宋体;">在后,就是</span>Little endian<span style="font-family:宋体;">方式。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">这两个古怪的名称来自英国作家斯威夫特的《格列佛游记》。在该书中,小人国里爆发了内战,战争起因是人们争论,吃鸡蛋时究竟是从大头(Big-Endian)<span style="font-family:宋体;">敲开还是从小头</span>(Little-Endian)<span style="font-family:宋体;">敲开。为了这件事情,前后爆发了六次战争,一个皇帝送了命,另一个皇帝丢了王位。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">因此,第一个字节在前,就是”<span style="font-family:宋体;">大头方式</span>“<span style="font-family:宋体;">(</span>Big endian<span style="font-family:宋体;">),第二个字节在前就是</span>”<span style="font-family:宋体;">小头方式</span>“<span style="font-family:宋体;">(</span>Little endian<span style="font-family:宋体;">)。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">那么很自然的,就会出现一个问题:计算机怎么知道某一个文件到底采用哪一种方式编码?</span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">Unicode<span style="font-family:宋体;">规范中定义,每一个文件的最前面分别加入一个表示编码顺序的字符,这个字符的名字叫做</span>”<span style="font-family:宋体;">零宽度非换行空格</span>“<span style="font-family:宋体;">(</span>ZERO WIDTH NO-BREAK SPACE<span style="font-family:宋体;">),用</span>FEFF<span style="font-family:宋体;">表示。这正好是两个字节,而且</span>FF<span style="font-family:宋体;">比</span>FE<span style="font-family:宋体;">大</span>1<span style="font-family:宋体;">。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">如果一个文本文件的头两个字节是FE FF<span style="font-family:宋体;">,就表示该文件采用大头方式;如果头两个字节是</span>FF FE<span style="font-family:宋体;">,就表示该文件采用小头方式。</span></span>

8. 实例

<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">下面,举一个实例。</span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">打开”<span style="font-family:宋体;">记事本</span>“<span style="font-family:宋体;">程序</span>Notepad.exe<span style="font-family:宋体;">,新建一个文本文件,内容就是一个</span>”<span style="font-family:宋体;">严</span>“<span style="font-family:宋体;">字,依次采用</span>ANSI<span style="font-family:宋体;">,</span>Unicode<span style="font-family:宋体;">,</span>Unicode big endian <span style="font-family:宋体;">和 </span>UTF-8<span style="font-family:宋体;">编码方式保存。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">然后,用文本编辑软件</span><a target=_blank style="OUTLINE-STYLE: none; COLOR: rgb(61,129,238); TEXT-DECORATION: none" href="http://www.google.cn/search?aq=t&oq=UltraEdit&complete=1&hl=zh-CN&newwindow=1&rlz=1B3GGGL_zh-CNCN216CN216&q=ultraedit+%E4%B8%8B%E8%BD%BD&btnG=Google+%E6%90%9C%E7%B4%A2&meta="><span style="font-family:'Times New Roman';color:#00ff;FONT-SIZE: 16.5pt"><u>UltraEdit<span style="font-family:宋体;">中</span></u></span></a><span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">的”<span style="font-family:宋体;">十六进制功能</span>“<span style="font-family:宋体;">,观察该文件的内部编码方式。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">1<span style="font-family:宋体;">)</span>ANSI<span style="font-family:宋体;">:文件的编码就是两个字节</span>“D1 CF”<span style="font-family:宋体;">,这正是</span>“<span style="font-family:宋体;">严</span>”<span style="font-family:宋体;">的</span>GB2312<span style="font-family:宋体;">编码,这也暗示</span>GB2312<span style="font-family:宋体;">是采用大头方式存储的。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">2<span style="font-family:宋体;">)</span>Unicode<span style="font-family:宋体;">:编码是四个字节</span>“FF FE 25 4E”<span style="font-family:宋体;">,其中</span>“FF FE”<span style="font-family:宋体;">表明是小头方式存储,真正的编码是</span>4E25<span style="font-family:宋体;">。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">3<span style="font-family:宋体;">)</span>Unicode big endian<span style="font-family:宋体;">:编码是四个字节</span>“FE FF 4E 25”<span style="font-family:宋体;">,其中</span>“FE FF”<span style="font-family:宋体;">表明是大头方式存储。</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">4<span style="font-family:宋体;">)</span>UTF-8<span style="font-family:宋体;">:编码是六个字节</span>“EF BB BF E4 B8 A5”<span style="font-family:宋体;">,前三个字节</span>“EF BB BF”<span style="font-family:宋体;">表示这是</span>UTF-8<span style="font-family:宋体;">编码,后三个</span>“E4B8A5”<span style="font-family:宋体;">就是</span>“<span style="font-family:宋体;">严</span>”<span style="font-family:宋体;">的具体编码,它的存储顺序与编码顺序是一致的。</span></span>

9. 延伸阅读

<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">* </span><a target=_blank style="OUTLINE-STYLE: none; COLOR: rgb(61,129,238); TEXT-DECORATION: none" href="http://www.joelonsoftware.com/articles/Unicode.html"><span style="font-family:'Times New Roman';color:#00ff;FONT-SIZE: 16.5pt"><u>The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets</u></span></a><span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">(关于字符集的最基本知识)</span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">* </span><a target=_blank style="OUTLINE-STYLE: none; COLOR: rgb(61,129,238); TEXT-DECORATION: none" href="http://www.pconline.com.cn/pcedu/empolder/gj/other/0505/616631.html"><span style="font-family:'Times New Roman';color:#00ff;FONT-SIZE: 16.5pt"><u>谈谈Unicode<span style="font-family:宋体;">编码</span></u></span></a>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">* </span><a target=_blank style="OUTLINE-STYLE: none; COLOR: rgb(61,129,238); TEXT-DECORATION: none" href="http://www.ietf.org/rfc/rfc3629.txt"><span style="font-family:'Times New Roman';color:#00ff;FONT-SIZE: 16.5pt"><u>RFC3629<span style="font-family:宋体;">:</span>UTF-8, a transformation format of ISO 10646</u></span></a><span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">(如果实现UTF-8<span style="font-family:宋体;">的规定)</span></span>
<span style="font-family:'Times New Roman';color:#111111;FONT-SIZE: 16.5pt">(完)</span>
</pre><h1 style="TEXT-ALIGN: left; PADDING-BOTTOM: 8px; BACKGROUND-COLOR: rgb(153,153,153); MARGIN-TOP: 0pt; PADDING-LEFT: 10px; PADDING-RIGHT: 10px; FONT-FAMILY: 微软雅黑, sans-serif; MARGIN-BOTTOM: 0pt; COLOR: rgb(255,255,255); FONT-SIZE: 26px; FONT-WEIGHT: normal; PADDING-TOP: 8px"><a target=_blank name="t15"></a><a target=_blank style="OUTLINE-STYLE: none; COLOR: rgb(61,129,238); TEXT-DECORATION: none" href="http://www.cnblogs.com/xiaoyz/archive/2008/10/11/1308860.html"><span style="font-family:Arial;FONT-SIZE: 16pt"><strong>C++<span style="font-family:黑体;">的中英文字符串表示</span>(string,wstring)</strong></span></a></h1><pre style="TEXT-ALIGN: left; LINE-HEIGHT: 25px; BACKGROUND-COLOR: rgb(255,255,255); MARGIN-TOP: 0px; WORD-WRAP: break-word; WHITE-SPACE: pre-wrap; MARGIN-BOTTOM: 0px; COLOR: rgb(51,51,51); FONT-SIZE: 14px" class="p0" name="code"><span style="font-family:微软雅黑;color:#000000;FONT-SIZE: 13.5pt">(从前面的资料可知,</span><span style="font-family:微软雅黑;color:#000000;FONT-SIZE: 13.5pt">string完全可以存储中文(有效编码只有'\0'=0,其他字符均不为0),但是在显示、字符操作等方面是无法保证的!</span><span style="font-family:微软雅黑;color:#000000;FONT-SIZE: 13.5pt">)</span>
<span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt">      在C++<span style="font-family:宋体;">中字符串类的</span>string<span style="font-family:宋体;">的模板原型是</span>basic_string</span>
<span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">template <</span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">class</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> _Elem, </span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">class</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> traits = char_traits<_Elem>, </span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">class</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> _Ax = allocator<_Elem>></span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">class</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> basic_string{</span><img style="BORDER-BOTTOM: 0px; BORDER-LEFT: 0px; BORDER-TOP: 0px; BORDER-RIGHT: 0px" alt="" src="file:///C:/Users/NOLANC~1/AppData/Local/Temp/ksohtml/wps_clip_image-5640.png" width="16" height="14" /><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">};</span>
<span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt"> </span><span style="font-family:'Times New Roman';LINE-HEIGHT: 15px">     第一个参数_Elem<span style="font-family:宋体;">表示类型。第二个参数</span>traits<span style="font-family:宋体;">的缺省值使用</span>char_traits<span style="font-family:宋体;">类型,定义了类型和字符操作的函数,如比较、等价、分配等。第三个参数</span>_Ax<span style="font-family:宋体;">的默认值是</span>allocator<span style="font-family:宋体;">类,表示了内存模式,不同的内存结构将操作指针的不同行为,例如栈、堆或段内存模式等。</span></span>
<span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt">     在C++<span style="font-family:宋体;">标准里定义了两个字符串</span>string<span style="font-family:宋体;">和</span>wstring</span>
<span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">typedef basic_string<</span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">char</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">> </span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">string</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">typedef basic_string<wchar_t> wstring;</span>
<span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt"> </span><span style="font-family:'Times New Roman';LINE-HEIGHT: 15px">     前者string<span style="font-family:宋体;">是常用类型,可以看作</span>char[]<span style="font-family:宋体;">,其实这正是与</span>string<span style="font-family:宋体;">定义中的</span>_Elem=char<span style="font-family:宋体;">相一致。而</span>wstring<span style="font-family:宋体;">,使用的是</span>wchar_t<span style="font-family:宋体;">类型,这是宽字符,用于满足非</span>ASCII<span style="font-family:宋体;">字符的要求,例如</span>Unicode<span style="font-family:宋体;">编码,中文,日文,韩文什么的。对于</span>wchar_t<span style="font-family:宋体;">类型,实际上</span>C++<span style="font-family:宋体;">中都用与</span>char<span style="font-family:宋体;">函数相对应的</span>wchar_t<span style="font-family:宋体;">的函数,因为他们都是从同一个模板类似于上面的方式定义的。因此也有</span>wcout, wcin, werr<span style="font-family:宋体;">等函数。</span></span>
<span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt">     实际上string<span style="font-family:宋体;">也可以使用中文,但是它将一个汉字写在</span>2<span style="font-family:宋体;">个</span>char<span style="font-family:宋体;">中。而如果将一个汉字看作一个单位</span>wchar_t<span style="font-family:宋体;">的话,那么在</span>wstring<span style="font-family:宋体;">中就只占用一个单元,其它的非英文文字和编码也是如此。这样才真正的满足字符串操作的要求,尤其是国际化等工作。</span></span>
<span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt">     看一下下面的程序,就会理解两者的差别。</span>
<span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">#include <iostream></span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">#include <</span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">string</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">></span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">using</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> </span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">namespace</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> std;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">#define</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> tab "\t"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">int</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> main()</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">{</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    locale def;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    cout<<def.name()<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    locale current = cout.getloc();</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    cout<<current.name()<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">float</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> val=</span><span style="font-family:'Courier New';color:#80080;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">1234.56</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    cout<<val<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">//chage to french/france</span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    cout.imbue(locale(</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"chs"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">));</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    current=cout.getloc();</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    cout<<current.name()<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    cout<<val<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">//<span style="font-family:宋体;">上面是说明</span>locale<span style="font-family:宋体;">的用法,下面才是本例的内容,因为其中用到了</span>imbue<span style="font-family:宋体;">函数</span></span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    cout<<</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"*********************************"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"><<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">//<span style="font-family:宋体;">为了保证本地化输出(文字</span>/<span style="font-family:宋体;">时间</span>/<span style="font-family:宋体;">货币等),</span>chs<span style="font-family:宋体;">表示中国,</span>wcout<span style="font-family:宋体;">必须使用本地化解析编码</span></span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wcout.imbue(std::locale(</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"chs"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">));</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">//string <span style="font-family:宋体;">英文,正确颠倒位置,显示第二个字符正确</span></span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">string</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> str1(</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"ABCabc"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">);</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">string</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> str11(str1.rbegin(),str1.rend());</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    cout<<</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"UK\ts1\t:"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"><<str1<<tab<<str1[</span><span style="font-family:'Courier New';color:#80080;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">1</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">]<<tab<<str11<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">//wstring <span style="font-family:宋体;">英文,正确颠倒位置,显示第二个字符正确</span></span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wstring str2=L</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"ABCabc"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wstring str22(str2.rbegin(),str2.rend());</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wcout<<</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"UK\tws4\t:"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"><<str2<<tab<<str2[</span><span style="font-family:'Courier New';color:#80080;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">1</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">]<<tab<<str22<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">//string <span style="font-family:宋体;">中文,颠倒后,变成乱码,第二个字符读取也错误</span></span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">string</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> str3(</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"<span style="font-family:宋体;">你好么?</span>"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">);</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">string</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> str33(str3.rbegin(),str3.rend());</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    cout<<</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"CHN\ts3\t:"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"><<str3<<tab<<str3[</span><span style="font-family:'Courier New';color:#80080;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">1</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">]<<tab<<str33<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">//<span style="font-family:宋体;">正确的打印第二个字符的方法</span></span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    cout<<</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"CHN\ts3\t:RIGHT\t"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"><<str3[</span><span style="font-family:'Courier New';color:#80080;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">2</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">]<<str3[</span><span style="font-family:'Courier New';color:#80080;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">3</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">]<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">//<span style="font-family:宋体;">中文,正确颠倒位置,显示第二个字符正确</span></span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wstring str4=L</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"<span style="font-family:宋体;">你好么?</span>"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wstring str44(str4.rbegin(),str4.rend());</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wcout<<</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"CHN\tws4\t:"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"><<str4<<tab<<str4[</span><span style="font-family:'Courier New';color:#80080;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">1</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">]<<tab<<str44<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wstring str5(str1.begin(),str1.end());</span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">//<span style="font-family:宋体;">只有</span>char<span style="font-family:宋体;">类型的</span>string<span style="font-family:宋体;">时才可以如此构造</span></span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wstring str55(str5.rbegin(),str5.rend());</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wcout<<</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"CHN\tws5\t:"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"><<str5<<tab<<str5[</span><span style="font-family:'Courier New';color:#80080;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">1</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">]<<tab<<str55<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wstring str6(str3.begin(),str3.end());</span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">//<span style="font-family:宋体;">如此构造将失败</span>!!!!</span><span style="font-family:'Courier New';color:#0800;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wstring str66(str6.rbegin(),str6.rend());</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    wcout<<</span><span style="font-family:'Courier New';color:#8000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">"CHN\tws6\t:"</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"><<str6<<tab<<str6[</span><span style="font-family:'Courier New';color:#80080;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">1</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">]<<tab<<str66<<endl;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">    </span><span style="font-family:'Courier New';color:#00ff;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">return</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt"> </span><span style="font-family:'Courier New';color:#80080;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">0</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">;</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">}</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span><span style="font-family:'Courier New';color:#000000;BACKGROUND-COLOR: rgb(245,245,245); FONT-SIZE: 9.5pt">
</span>
<span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt"> </span><span style="font-family:'Times New Roman';LINE-HEIGHT: 15px">结果如下:(略)</span>
</pre><pre style="TEXT-ALIGN: left; LINE-HEIGHT: 25px; BACKGROUND-COLOR: rgb(255,255,255); MARGIN-TOP: 0px; WORD-WRAP: break-word; WHITE-SPACE: pre-wrap; MARGIN-BOTTOM: 0px; COLOR: rgb(51,51,51); FONT-SIZE: 14px" class="p0" name="code"><span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt">     上面显示了本地化的作用,是在数字中每三位加一个逗号,其实对时间/<span style="font-family:宋体;">文字等都是用影响的。</span></span>
<span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt">     下面的输出说明了,如何正确使用string<span style="font-family:宋体;">和</span>wstring<span style="font-family:宋体;">的方法。第三个因为使用</span>string<span style="font-family:宋体;">来表示汉字,出现了一些错误。最后一行也是错误,导致了输出也受到了影响,没有空格与回车。(最后两个就不要管中英文了,仅仅说明一下中文构造方法是错误的)</span></span>
<span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt">     《掌握标准C++<span style="font-family:宋体;">类》在第十二章《语言支持》中专门讲</span>C++<span style="font-family:宋体;">的国际化和本地化问题,</span>C++<span style="font-family:宋体;">提供了</span>I18N<span style="font-family:宋体;">的标准处理,软件开发者可以参考。</span></span>
<span style="font-family:'Times New Roman';FONT-SIZE: 10.5pt">       C++<span style="font-family:宋体;">标准库还是非常博大精深的,功能比较齐全的。继续学习。</span></span>
<span style="font-family:宋体;"><span style="LINE-HEIGHT: 15px">摘自: </span></span><span style="font-family:verdana, Arial, Helvetica, sans-serif;LINE-HEIGHT: 15px; WHITE-SPACE: normal"><a target=_blank style="OUTLINE-STYLE: none; COLOR: rgb(61,129,238); TEXT-DECORATION: none" href="http://www.cnblogs.com/xiaoyz/archive/2008/10/11/1308860.html"><span>http://www.cnblogs.com/xiaoyz/archive/2008/10/11/1308860.html</span></a></span>
  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值