C#.net 中的编码－Encoding

最新推荐文章于 2024-07-09 20:14:50 发布

艾伦之家

最新推荐文章于 2024-07-09 20:14:50 发布

阅读量2.4k

点赞数

文章标签： encoding byte string 测试 function c#

2007-03-23 作者:龚军勇 http://www.flyy.info/blog/index.php?blogid=274

description:

在c#中提供了编码类Encoding，它有四种编码方案：

1、ASCIIEncoding
2、UnicodeEncoding
3、UTF7Encoding
4、UTF8Encoding

另外还有我们所熟悉的GB2312编码，但是它不是Encoding类下的显示类。尽管如此我们仍然可以使用，Encoding encoder = Encoding.GetEncoding("gb2312");
<script type="text/javascript"> </script> <script src="http://pagead2.googlesyndication.com/pagead/show_ads.js" type="text/javascript"> </script> width="468" scrolling="no" height="60" frameborder="0" allowtransparency="true" hspace="0" vspace="0" marginheight="0" marginwidth="0" src="http://pagead2.googlesyndication.com/pagead/ads?client=ca-pub-2801850660129274&dt=1205810831203&lmt=1205810830&format=468x60_as&output=html&correlator=1205810831031&url=http%3A%2F%2Fwww.flyy.info%2Fblog%2Findex.php%3Fblogid%3D274&color_bg=FFFFFF&color_text=000000&color_link=0000FF&color_url=008000&color_border=FFFFFF&ad_type=text_image&ref=http%3A%2F%2Fcache.baidu.com%2Fc%3Fm%3D9d78d513d9821cfe01bad2690d6782360e55f0633dd4d0027fa5c25f93654f060738ece161645213d2b6617a44ea0c4bea87622c7c4837b7ec93d41e80a6c27672dd3a6f2c4ad10b05d36eed910432c054c004b4fa42b1adf14493abd4%26p%3D97759a46d38608ef12be9b7f0d4ec9%26user%3Dbaidu&frm=0&cc=100&ga_vid=823929652.1205810831&ga_sid=1205810831&ga_hid=1137843022&flash=9.0.115&u_h=768&u_w=1024&u_ah=734&u_aw=1024&u_cd=32&u_tz=480&u_his=2&u_java=true&u_nplug=7&u_nmime=20" name="google_ads_frame"> <script type="text/javascript"> function dispaly(id){ var div=document.getElementById(id); if(div.style.visibility=='visible'){div.style.visibility='hidden'} else div.style.visibility='visible'; } </script>

各种编码类型不一定是兼容的，尤其是在有中文的情况下。在解码从文件读取字节数据时，这一点有重要的影响。
     例如：用unicode编码方案写入的数据不能使用ASCII或UTF-8编码方案正确的解码。这是因为unicode编码使用两个字节表示每个字符，而其他方案不是这样。

1、首先来看一下编码后的字节数组内容
     我们写了以下程序进行验证这一点：
     对于这段字符串 string str="测试文本 Test Text" 分别通过GB2312和UTF8对其进行编码，得到以下字节数组：
     -------------------------------GB2312-----------------
   178   226   202   212   206   196   177   190   32   84   101   115   116   32   84   101   120   116

     ------------------------------UTF8------------------
   230   181   139   232   175   149   230   150   135   230   156   172   32   84   101   115   116   32   84   101   120   116
   可以看出对于中文两者编码后的字节是不一样的，对于英文和空格的编码是一样的。

2、两种编码方案保存到文本的情况
    对于同一段字符串 string str="测试文本 Test Text" 分别通过GB2312和UTF8对其进行编码，然后分别保存为两个文本文件，代码如下：

private void GetUTF8()
  {
   Encoding encoder=Encoding.UTF8 ;
   byte[] buff = new byte[encoder.GetByteCount(str)] ;
   int j = encoder.GetBytes(str,0,str.Length,buff,0) ;
   this.textBox1.Text += " -------------------------------UTF8------------------ ";

   for(int i=0;i<buff.Length;++i)
   {
    this.textBox1.Text += "   "+buff[i];
   }
   FileStream fs = File.Create(@"E:WinFormWindowsApplication1WindowsApplication1UTF8.txt");
   fs.Write(buff,0,buff.Length) ;
   fs.Close();
  }

打开记事本都可以正常显示中文，接下来在对文本进行读取显示：
A、用GB2312方案读gb2312写入的文本显示正常

FileStream fs = File.OpenRead(@"E:WinFormWindowsApplication1WindowsApplication1GB2312.txt");
   byte[] buff = new byte[(int)fs.Length] ;
   fs.Read(buff,0,buff.Length);
   Encoding encoder = Encoding.GetEncoding("gb2312");
   string str1 = encoder.GetString(buff,0,buff.Length);
   this.textBox1.Text = "-------------GB2312 Encode---------------- " + str1 +" ";

B、用Gb2312方案读取Utf8编码写入的文本显示如下：
    ? Test Text
C、用UTF8读取GB2312写入的文本显示如下：
   娴????? Test Text
这正是我们经常遇见的乱码，多少情况都是编码方案不对应。

一般来说我们都采用UTF8进行编码，在网络中传输数据都默认这种方式。