为什么有时读取XML文件会失败

用StringBuilder创建的内存XML文件,如果用ToString的方式转换进XmlDocument.LoadXml中,不会有问题,此时编码为缺省的Unicode,UTF-16.但是如果用MemoryStream通过设置Encoding属性为Encoding.UTF8,再通过Encoding.UTF8.GetString(stream.ToArray());转换为String,导入XmlDocument则会报不可识别的字符错误,但是Encoding换成new UTF8Encoding();则没有问题,这是为什么呢?

其实奥妙就在Encoding得前置标识字符上,通过查看转换结果可以看出

System.Text.Encoding.UTF8.GetBytes(ss);

错误的情况前面增加了EFBBBF三个字节。通过MSDN可知这是为了标识编码方式的信息

Optionally, the Encoding provides a preamble which is an array of bytes that can be prefixed to the sequence of bytes resulting from the encoding process. If the preamble contains a byte order mark (In Unicode, code point U+FEFF), it helps the decoder determine the byte order and the transformation format or UTF. The Unicode byte order mark is serialized as follows (in hexadecimal):

  • UTF-8: EF BB BF

  • UTF-16 big-endian byte order: FE FF

  • UTF-16 little-endian byte order: FF FE

  • UTF-32 big-endian byte order: 00 00 FE FF

  • UTF-32 little-endian byte order: FF FE 00 00

那为什么new UTF8Encoding()没有问题呢?

 

引用老外的话来解释:

The hex value you see sets the byte ordering mark of the text. If you are using UTF8, should be 3 characters long and of value 0xEFBBBF. You can actually see it by calling Encoding.UTF8.GetPreamble().

 

I searched more thoroughly, and there is actually a difference between the two calls you were making:

Encoding.UTF8 returns a new instance of UTF8Encoding(true ), so you get an encoder that use the preamble of UTF8 for all encoding operation.

When you called UTF8Encoding(), the default is to call UTF8Encoding(false ), which does not use the preamble of UTF8 for encoding operation. (the preamble will then be an empty byte array)

 

So when you used Encoding.UTF8, the preamble was emitted, rendering your data invalid.

这几个字符,显然XmlDocument无法处理,所以报错。

个人觉得,在程序内部或内存里,还是不要加前缀的好,就是裸着。

如果保存到文件中,考虑到国际化的因素,需要用到字节顺序识别等等内容的,可以加前缀,但是读入内存最好把它去掉。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值