java查看字符集,如何检查字符串的字符集在Java?

In my application I'm getting the user info from LDAP and sometimes the full username comes in a wrong charset. For example:

ТеÑÑ61 ТеÑÑовиÑ61

It can also be in English or in Russian and displayed correctly. If the username changes it's updated in database. Even if I change the value in the db it wont solve the problem.

I can fix it before saving by doing this

new String(incorrect.getBytes("ISO-8859-1"), "UTF-8");

However, if I will use it for the string including characters in Russian (for ex., "Тест61 Тестович61") I get something like this "????61 ????????61".

Can you please suggest something that can determine the charset of string?

解决方案

Strings in java, AFAIK, do not retain their original encoding - they are always stored internally in some Unicode form.

You want to detect the charset of the original stream/bytes - this is why I think your String.toBytes() call is too late.

Ideally if you could get the input stream you are reading from, you can run it through something like this: http://code.google.com/p/juniversalchardet/

There are plenty of other charset detectors out there as well

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值