编码转换会丢失信息吗

最新推荐文章于 2023-01-28 21:36:37 发布

bloodnight

最新推荐文章于 2023-01-28 21:36:37 发布

阅读量254

点赞数

分类专栏：问题交流文章标签： Eclipse

问题交流专栏收录该内容

24 篇文章 0 订阅

订阅专栏

编码转换会丢失信息吗？

这是个命题，根据目前的研究结果，答案是肯定的，就是会丢失，理由如下：

String m = URLEncoder.encode("聶","iso-8859-1");

System.out.println(m);
		
String g = URLDecoder.decode(m,"gbk");
		
System.out.println(g);

经过编码转换后，1字节的信息丢失了，所以不可能再还原了

而又说Eclipse里显示编码转换，信息不会丢失，是因为它没有编码过程，只有解码过程，随便怎么转换，都不会丢失，它的应用场景也只是去寻找一种合适的解码形式，原始的编码是不变的

String mk = URLEncoder.encode("聶","gbk");

String i = URLDecoder.decode(mk,"iso-8859-1");
		
System.out.println("i = " + i);
		
String ik = URLDecoder.decode(mk,"gbk");
		
System.out.println("ik = " + ik);

2010.06.08 补充添加：

Garbled summary

In the Java run-time of the world, garbled generation (both compile-time generated here) exist in two places at source, in fact, that is what I have mentioned two functions (of course, sometimes the framework of which helped us a call a function, so you get is already uploaded by the network over a byte array converted to String a),

getBytes (String charset) if according to a specified charset to encode a unicode String, but found that the coding system, where (for example, iso-8859-1) do not have this character, it will be encoded into the 3F (actually a question mark), so that has caused the loss of information, and can not be restored.
new String (byte [] bytes, String charset) if a byte array according to a specified character set to decode the character set, but suddenly some of them do not know when the encoding, for example, a certain period of a byte array according to UTF-8 decoding time, do not know, and to a unicode string side is the "\ uFFFD", in fact this thing called 'REPLACEMENT CHARACTER', shows a question mark

Therefore, we encounter the following situations are often garbled

1. A kind of encoded files to another way to parse code to read,

this would certainly garbled, this is where we open a file when

the operating systemfrequently.
2. The wrong way transmission over the encoding of the byte stream decoding.

So, get the wrong unicode string.
3. And console inconsistent encoding of unicode strings correctly coded,

and sent to the console display. Will be garbled.

from：

http://www.codeweblog.com/java-depth-analysis-of-the-character-encoding/

bloodnight

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
编码转换会丢失信息吗

编码转换会丢失信息吗？这是个命题，根据目前的研究结果，答案是肯定的，就是会丢失，理由如下：String m = URLEncoder.encode("聶","iso-8859-1");System.out.println(m); String g = URLDecoder.decode(m,"gbk"); System.out.printl...
复制链接

扫一扫

专栏目录