iconv java,在java中是否有一个与// TRANSLIT等效的iconv?

这篇博客探讨了如何在Java中实现类似于UNIX命令`iconv`的字符集转换和转音功能。虽然没有直接对应的库,但可以通过Unicode的规范化处理(如NFKD)来移除字符的音调,并过滤非ASCII字符。然而,这是一个有损过程,可能会导致信息丢失,特别是对于非ASCII字符。建议根据预期输入字符范围建立自己的字符替换查找表,以确保更安全的转换。

Is there a way to achieve transliteration of characters between charsets in java? something similar to the unix command (or similar php function):

iconv -f UTF-8 -t ASCII//TRANSLIT < some_doc.txt > new_doc.txt

preferably operating on strings, not having anything to do with files

I know you can can change encodings with the String constructor, but that doesn't handle transliteration of characters that aren't in the resulting charset.

解决方案

I'm not aware of any libraries that do exactly what iconv purports to do (which doesn't seem very well defined). However, you can use "normalization" in Java to do things like remove accents from characters. This process is well defined by Unicode standards.

I think NFKD (compatibility decomposition) followed by a filtering of non-ASCII characters might get you close to what you want. Obviously, this is a lossy process; you can never recover all of the information that was in the original string, so be careful.

/* Decompose original "accented" string to basic characters. */

String decomposed = Normalizer.normalize(accented, Normalizer.Form.NFKD);

/* Build a new String with only ASCII characters. */

StringBuilder buf = new StringBuilder();

for (int idx = 0; idx < decomposed.length(); ++idx) {

char ch = decomposed.charAt(idx);

if (ch < 128)

buf.append(ch);

}

String filtered = buf.toString();

With the filtering used here, you might render some strings unreadable. For example, a string of Chinese characters would be filtered away completely because none of them have an ASCII representation (this is more like iconv's //IGNORE).

Overall, it would be safer to build your own lookup table of valid character substitutions, or at least of combining characters (accents and things) that are safe to strip. The best solution depends on the range of input characters you expect to handle.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值