多字节字符与宽字节字符

转载自:http://blog.csdn.net/langb2014/article/details/52471256?locationNum=2&fps=1

char与wchar_t

我们知道C++基本数据类型中表示字符的有两种:char、wchar_t。 
char叫多字节字符,一个char占一个字节,之所以叫多字节字符是因为它表示一个字时可能是一个字节也可能是多个字节。一个英文字符(如’s’)用一个char(一个字节)表示,一个中文汉字(如’中’)用3个char(三个字节)表示,看下面的例子。

<code class="language-C++ hljs cpp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">void</span> TestChar() { <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">char</span> ch1 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'s'</span>; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// 正确</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">cout</span> << <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"ch1:"</span> << ch1 << endl; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">char</span> ch2 = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'中'</span>; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// 错误,一个char不能完整存放一个汉字信息</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">cout</span> << <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"ch2:"</span> << ch2 << endl; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">char</span> str[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">4</span>] = <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"中"</span>; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//前三个字节存放汉字'中',最后一个字节存放字符串结束符\0</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">cout</span> << <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"str:"</span> << str << endl; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//char str2[2] = "国"; // 错误:'str2' : array bounds overflow</span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">//cout << str2 << endl;</span> }</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li></ul>

结点如下:

ch1:s 
ch2: 
str:中

wchar_t被称为宽字符,一个wchar_t占2个字节。之所以叫宽字符是因为所有的字都要用两个字节(即一个wchar_t)来表示,不管是英文还是中文。看下面的例子:

<code class="language-C++ hljs objectivec has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">void</span> TestWchar_t() { wcout<span class="hljs-variable" style="color: rgb(102, 0, 102); box-sizing: border-box;">.imbue</span>(locale(<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"chs"</span>)); <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// 将wcout的本地化语言设置为中文</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">wchar_t</span> wch1 = L<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'s'</span>; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// 正确</span> wcout << <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"wch1:"</span> << wch1 << endl; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">wchar_t</span> wch2 = L<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">'中'</span>; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// 正确,一个汉字用一个wchar_t表示</span> wcout << <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"wch2:"</span> << wch2 << endl; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">wchar_t</span> wstr[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">2</span>] = L<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"中"</span>; <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// 前两个字节(前一个wchar_t)存放汉字'中',最后两个字节(后一个wchar_t)存放字符串结束符\0</span> wcout << <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"wstr:"</span> << wstr << endl; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">wchar_t</span> wstr2[<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">3</span>] = L<span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"中国"</span>; wcout << <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">"wstr2:"</span> << wstr2 << endl; }</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li></ul>

结果如下:

ch1:s 
ch2:中 
str:中 
str2:中国

说明: 
1. 用常量字符给wchar_t变量赋值时,前面要加L。如: wchar_t wch2 = L’中’; 
2. 用常量字符串给wchar_t数组赋值时,前面要加L。如: wchar_t wstr2[3] = L”中国”; 
3. 如果不加L,对于英文可以正常,但对于非英文(如中文)会出错。

string与wstring

字符数组可以表示一个字符串,但它是一个定长的字符串,我们在使用之前必须知道这个数组的长度。为方便字符串的操作,STL为我们定义好了字符串的类string和wstring。大家对string肯定不陌生,但wstring可能就用的少了。

string是普通的多字节版本,是基于char的,对char数组进行的一种封装。

wstring是Unicode版本,是基于wchar_t的,对wchar_t数组进行的一种封装。

string 与 wstring的相关转换:

以下的两个方法是跨平台的,可在Windows下使用,也可在Linux下使用。

<code class="language-C++ hljs cpp has-numbering" style="display: block; padding: 0px; color: inherit; box-sizing: border-box; font-family: 'Source Code Pro', monospace;font-size:undefined; white-space: pre; border-radius: 0px; word-wrap: normal; background: transparent;"><span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">#include <cstdlib></span> <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">#include <string.h></span> <span class="hljs-preprocessor" style="color: rgb(68, 68, 68); box-sizing: border-box;">#include <string></span> <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// wstring => string</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">std</span>::<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span> WString2String(<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">const</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">std</span>::wstring& ws) { <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">std</span>::<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span> strLocale = setlocale(LC_ALL, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">""</span>); <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">const</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">wchar_t</span>* wchSrc = ws.c_str(); size_t nDestSize = wcstombs(NULL, wchSrc, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>) + <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">char</span> *chDest = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">char</span>[nDestSize]; <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">memset</span>(chDest,<span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>,nDestSize); wcstombs(chDest,wchSrc,nDestSize); <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">std</span>::<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span> strResult = chDest; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">delete</span> []chDest; setlocale(LC_ALL, strLocale.c_str()); <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> strResult; } <span class="hljs-comment" style="color: rgb(136, 0, 0); box-sizing: border-box;">// string => wstring</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">std</span>::wstring String2WString(<span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">const</span> <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">std</span>::<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span>& s) { <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">std</span>::<span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">string</span> strLocale = setlocale(LC_ALL, <span class="hljs-string" style="color: rgb(0, 136, 0); box-sizing: border-box;">""</span>); <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">const</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">char</span>* chSrc = s.c_str(); size_t nDestSize = mbstowcs(NULL, chSrc, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>) + <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">1</span>; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">wchar_t</span>* wchDest = <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">new</span> <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">wchar_t</span>[nDestSize]; wmemset(wchDest, <span class="hljs-number" style="color: rgb(0, 102, 102); box-sizing: border-box;">0</span>, nDestSize); mbstowcs(wchDest,chSrc,nDestSize); <span class="hljs-built_in" style="color: rgb(102, 0, 102); box-sizing: border-box;">std</span>::wstring wstrResult = wchDest; <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">delete</span> []wchDest; setlocale(LC_ALL, strLocale.c_str()); <span class="hljs-keyword" style="color: rgb(0, 0, 136); box-sizing: border-box;">return</span> wstrResult; }</code><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li></ul><ul class="pre-numbering" style="box-sizing: border-box; position: absolute; width: 50px; top: 0px; left: 0px; margin: 0px; padding: 6px 0px 40px; border-right-width: 1px; border-right-style: solid; border-right-color: rgb(221, 221, 221); list-style: none; text-align: right; background-color: rgb(238, 238, 238);"><li style="box-sizing: border-box; padding: 0px 5px;">1</li><li style="box-sizing: border-box; padding: 0px 5px;">2</li><li style="box-sizing: border-box; padding: 0px 5px;">3</li><li style="box-sizing: border-box; padding: 0px 5px;">4</li><li style="box-sizing: border-box; padding: 0px 5px;">5</li><li style="box-sizing: border-box; padding: 0px 5px;">6</li><li style="box-sizing: border-box; padding: 0px 5px;">7</li><li style="box-sizing: border-box; padding: 0px 5px;">8</li><li style="box-sizing: border-box; padding: 0px 5px;">9</li><li style="box-sizing: border-box; padding: 0px 5px;">10</li><li style="box-sizing: border-box; padding: 0px 5px;">11</li><li style="box-sizing: border-box; padding: 0px 5px;">12</li><li style="box-sizing: border-box; padding: 0px 5px;">13</li><li style="box-sizing: border-box; padding: 0px 5px;">14</li><li style="box-sizing: border-box; padding: 0px 5px;">15</li><li style="box-sizing: border-box; padding: 0px 5px;">16</li><li style="box-sizing: border-box; padding: 0px 5px;">17</li><li style="box-sizing: border-box; padding: 0px 5px;">18</li><li style="box-sizing: border-box; padding: 0px 5px;">19</li><li style="box-sizing: border-box; padding: 0px 5px;">20</li><li style="box-sizing: border-box; padding: 0px 5px;">21</li><li style="box-sizing: border-box; padding: 0px 5px;">22</li><li style="box-sizing: border-box; padding: 0px 5px;">23</li><li style="box-sizing: border-box; padding: 0px 5px;">24</li><li style="box-sizing: border-box; padding: 0px 5px;">25</li><li style="box-sizing: border-box; padding: 0px 5px;">26</li><li style="box-sizing: border-box; padding: 0px 5px;">27</li><li style="box-sizing: border-box; padding: 0px 5px;">28</li><li style="box-sizing: border-box; padding: 0px 5px;">29</li><li style="box-sizing: border-box; padding: 0px 5px;">30</li><li style="box-sizing: border-box; padding: 0px 5px;">31</li><li style="box-sizing: border-box; padding: 0px 5px;">32</li><li style="box-sizing: border-box; padding: 0px 5px;">33</li></ul>

字符集(Charcater Set)与字符编码(Encoding)


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值