您可以使用
java.text.Normalizer和正则表达式来摆脱
diacritics,其中存在的远远超过您收集的数量.
这是一个SSCCE,在Java 6上复制’n’paste’n’run它:
package com.stackoverflow.q2653739;
import java.text.Normalizer;
import java.text.Normalizer.Form;
public class Test {
public static void main(String... args) {
System.out.println(removeDiacriticalMarks("Gračišće"));
}
public static String removeDiacriticalMarks(String string) {
return Normalizer.normalize(string, Form.NFD)
.replaceAll("\\p{InCombiningDiacriticalMarks}+", "");
}
}
这应该产生
Gracisce
至少,它在Eclipse中将控制台字符编码设置为UTF-8(Window> Preferences> General> Workspace> Text File Encoding).确保在您的环境中也设置了相同的设置.
作为替代方案,维护一个Map< Character,Character>:
Map charReplacementMap = new HashMap();
charReplacementMap.put('š', 's');
charReplacementMap.put('đ', 'd');
// Put more here.
String originalString = "Gračišće";
StringBuilder builder = new StringBuilder();
for (char currentChar : originalString.toCharArray()) {
Character replacementChar = charReplacementMap.get(currentChar);
builder.append(replacementChar != null ? replacementChar : currentChar);
}
String newString = builder.toString();