编码转换会丢失信息吗

 

    编码转换会丢失信息吗?

    这是个命题,根据目前的研究结果,答案是肯定的,就是会丢失,理由如下:

String m = URLEncoder.encode("聶","iso-8859-1");

System.out.println(m);
		
String g = URLDecoder.decode(m,"gbk");
		
System.out.println(g);

 经过编码转换后,1字节的信息丢失了,所以不可能再还原了

 

 

而又说Eclipse里显示编码转换,信息不会丢失,是因为它没有编码过程,只有解码过程,随便怎么转换,都不会丢失,它的应用场景也只是去寻找一种合适的解码形式,原始的编码是不变的

 

String mk = URLEncoder.encode("聶","gbk");

String i = URLDecoder.decode(mk,"iso-8859-1");
		
System.out.println("i = " + i);
		
String ik = URLDecoder.decode(mk,"gbk");
		
System.out.println("ik = " + ik);

 

 

2010.06.08 补充添加:

Garbled summary


In the Java run-time of the world, garbled generation (both compile-time generated here) exist in two places at source, in fact, that is what I have mentioned two functions (of course, sometimes the framework of which helped us a call a function, so you get is already uploaded by the network over a byte array converted to String a),

  • getBytes (String charset) if according to a specified charset to encode a unicode String, but found that the coding system, where (for example, iso-8859-1) do not have this character, it will be encoded into the 3F (actually a question mark), so that has caused the loss of information, and can not be restored.
  • new String (byte [] bytes, String charset) if a byte array according to a specified character set to decode the character set, but suddenly some of them do not know when the encoding, for example, a certain period of a byte array according to UTF-8 decoding time, do not know, and to a unicode string side is the "\ uFFFD", in fact this thing called 'REPLACEMENT CHARACTER', shows a question mark

    Therefore, we encounter the following situations are often garbled

         1. A kind of encoded files to another way to parse code to read,

             this would certainly garbled,  this  is where we open a file when

             the operating systemfrequently.
         2. The wrong way transmission over the encoding of the byte stream decoding.

             So, get the wrong unicode string.
         3. And console inconsistent encoding of unicode strings correctly coded,

             and sent to the console  display. Will be garbled.

 

from:

http://www.codeweblog.com/java-depth-analysis-of-the-character-encoding/

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值