汉字转拼音
app中有许多场景要对汉字排序,例如通讯录姓名、商品名称、城市名称等等,这些汉字词汇通常是按照拼音排序,所以产生了把汉字转换为拼音的需求。Android自带库
Android自带的联系人应用,就支持对联系人按照姓名排序,内置汉字转拼音的源码位于路径“packages\providers\ContactsProvider\src\com\android\providers\contacts\HanziToPinyin.java”。该工具类早期的源码,直接把字符集合写在java文件中,这种做法在4.2以上版本不能正常工作。4.2以上的工具源码改为调用底层的jni接口,具体说来,便是HanziToPinyin.java引用了核心库libcore.icu的Transliterator类,Transliterator内部有提供数个native方法。不管是HanziToPinyin类还是Transliterator类,都属于系统源码,不属于sdk源码,也就是说,app开发无法直接调用这两个类的方法。只能是把这两个类的java文件直接复制到app工程中,才能正常调用其中的api。同时注意,Transliterator.java必须放在名称是libcore.icu的包路径下,因为该类引用了jni接口,而jni接口要求包名、类名、方法名都保持一致才能正常运行,jni的详细说明参见《 Android开发笔记(六十九)JNI实战》。
下面是HanziToPinyin.java的源码:
import android.text.TextUtils;
import android.util.Log;
import java.util.ArrayList;
import libcore.icu.Transliterator;
public class HanziToPinyin {
private static final String TAG = "HanziToPinyin";
private static HanziToPinyin sInstance;
private Transliterator mPinyinTransliterator;
private Transliterator mAsciiTransliterator;
public static class Token {
public static final String SEPARATOR = " ";
public static final int LATIN = 1;
public static final int PINYIN = 2;
public static final int UNKNOWN = 3;
public Token() {
}
public Token(int type, String source, String target) {
this.type = type;
this.source = source;
this.target = target;
}
public int type;
public String source;
public String target;
}
private HanziToPinyin() {
try {
mPinyinTransliterator = new Transliterator("Han-Latin/Names; Latin-Ascii; Any-Upper");
mAsciiTransliterator = new Transliterator("Latin-Ascii");
} catch (RuntimeException e) {
Log.w(TAG, "Han-Latin/Names transliterator data is missing," + " HanziToPinyin is disabled");
}
}
public boolean hasChineseTransliterator() {
return mPinyinTransliterator != null;
}
public static HanziToPinyin getInstance() {
synchronized (HanziToPinyin.class) {
if (sInstance == null) {
sInstance = new HanziToPinyin();
}
return sInstance;
}
}
private void tokenize(char character, Token token) {
token.source = Character.toString(character);
// ASCII
if (character < 128) {
token.type = Token.LATIN;
token.target = token.source;
return;
}
// Extended Latin. Transcode these to ASCII equivalents
if