java 字符utf8编码,Java字符串编码(UTF-8)

I have come across this line of legacy code, which I am trying to figure out:

String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));

As far as I can understand, it is encoding & decoding using the same charSet.

How is this different from the following?

String newString = oldString;

Is there any scenario in which the two lines will have different outputs?

p.s.: Just to clarify, yes I am aware of the excellent article on encoding by Joel Spolsky !

解决方案

This could be complicated way of doing

String newString = new String(oldString);

This shortens the String is the underlying char[] used is much longer.

However more specifically it will be checking that every character can be UTF-8 encoded.

There are some "characters" you can have in a String which cannot be encoded and these would be turned into ?

Any character between \uD800 and \uDFFF cannot be encoded and will be turned into '?'

String oldString = "\uD800";

String newString = new String(oldString.getBytes("UTF-8"), "UTF-8");

System.out.println(newString.equals(oldString));

prints

false

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值