我刚开始学习java.text.Normalizer。这看起来很直接。我有一个“怪异”的破折号来处理(特别是U+2013)
我想将它转换成正常的短线字符,所以我做了一些快速测试代码:
import java.text.Normalizer;
public class Test {
public static void main(String[] args) {
String weirdDash = "–";
String normalDash = "-";
boolean b = Normalizer.isNormalized(weirdDash, Normalizer.Form.NFD);
if(b == false) {
System.out.println("Java thinks the weird dash is normal");
return;
}
String normalizedWeirdDash = Normalizer.normalize(weirdDash, Normalizer.Form.NFD);
if(normalizedWeirdDash.equals(normalDash)) {
System.out.println("Yay!");
} else {
System.out.println("Boo! normalized weird dash "+(normalizedWeirdDash.equals(weirdDash) ? "didn't change" : "= " + normalizedWeirdDash));
}
}
}main()的输出是“Boo!normalized奇怪的短划线没有改变”
这怎么可能?这意味着Normalizer.isNormalized返回false,但随后在同一个字符串(具有相同的Normalizer.Form)上调用normalize()根本不会改变它。
我错过了什么?
编辑
该方法输出“true”:
public class Test {
public static void main(String[] args) {
String weirdDash = "–";
String normalDash = "-";
String newDash = weirdDash.replaceAll("(\\\u2013)", "-");
System.out.println(newDash.equals(normalDash));
}
}所以,如果一切都失败了,我可以使用它。但是出于好奇的缘故,Normalizer有什么用?