Android 汉语转拼音实现

目录

一、需求与需求分析

二、pinyin4j

三、jpinyin

3.1 JPinyin主要特性

3.2 集成

3.3 使用及效果

四、ASCII码映射

4.1 TextPinyinUtil工具类

4.2 使用及效果

往期推荐


一、需求与需求分析

        根据语音识别到的人名匹配应用内联系人并拨打电话。

        经调研总结了以下几种常用的处理方法:

        (1)pinyin4j(开源库)

        (2)jpinyin(开源库)

        (3)ASCII码映射

二、pinyin4j

        支持多种拼音系统(简体和繁体),包括汉语拼音、通用拼音、威妥玛拼音、MPS2、耶鲁拼音和国语拼音。支持多种发音和自定义输出。

        查看源码发现入参仅支持 char 类型。不太符合要求,继续寻找pinyin4j升级版=》jpinyin

/**
 * A class provides several utility functions to convert Chinese characters
 * (both Simplified and Tranditional) into various Chinese Romanization
 * representations
 * 
 * @author Li Min (xmlerlimin@gmail.com)
 */
public class PinyinHelper {
  /**
   * Get all unformmatted Hanyu Pinyin presentations of a single Chinese
   * character (both Simplified and Tranditional)
   * 
   * <p>
   * For example, <br/> If the input is '间', the return will be an array with
   * two Hanyu Pinyin strings: <br/> "jian1" <br/> "jian4" <br/> <br/> If the
   * input is '李', the return will be an array with single Hanyu Pinyin
   * string: <br/> "li3"
   * 
   * <p>
   * <b>Special Note</b>: If the return is "none0", that means the input
   * Chinese character exists in Unicode CJK talbe, however, it has no
   * pronounciation in Chinese
   * 
   * @param ch
   *            the given Chinese character
   * 
   * @return a String array contains all unformmatted Hanyu Pinyin
   *         presentations with tone numbers; null for non-Chinese character
   * 
   */
  static public String[] toHanyuPinyinStringArray(char ch) {
    return getUnformattedHanyuPinyinStringArray(ch);
  }

  static public String[] toHanyuPinyinStringArray(char ch, HanyuPinyinOutputFormat outputFormat)
      throws BadHanyuPinyinOutputFormatCombination {
    return getFormattedHanyuPinyinStringArray(ch, outputFormat);
  }

   。。。。。。
  /**
   * Get all unformmatted Tongyong Pinyin presentations of a single Chinese
   * character (both Simplified and Tranditional)
   * 
   * @param ch
   *            the given Chinese character
   * 
   * @return a String array contains all unformmatted Tongyong Pinyin
   *         presentations with tone numbers; null for non-Chinese character
   * 
   * @see #toHanyuPinyinStringArray(char)
   * 
   */
  static public String[] toTongyongPinyinStringArray(char ch) {
    return convertToTargetPinyinStringArray(ch, PinyinRomanizationType.TONGYONG_PINYIN);
  }

  static public String toHanyuPinyinString(String str, HanyuPinyinOutputFormat outputFormat,
      String seperater) throws BadHanyuPinyinOutputFormatCombination {

    StringBuffer resultPinyinStrBuf = new StringBuffer();

    for (int i = 0; i < str.length(); i++) {
      String mainPinyinStrOfChar = getFirstHanyuPinyinString(str.charAt(i), outputFormat);

      if (null != mainPinyinStrOfChar) {
        resultPinyinStrBuf.append(mainPinyinStrOfChar);
        if (i != str.length() - 1) { // avoid appending at the end
          resultPinyinStrBuf.append(seperater);
        }
      } else {
        resultPinyinStrBuf.append(str.charAt(i));
      }
    }

    return resultPinyinStrBuf.toString();
  }

  /**
   * Get the first Hanyu Pinyin of a Chinese character <b> This function will
   * be removed in next release. </b>
   * 
   * @param ch
   *            The given Unicode character
   * @param outputFormat
   *            Describes the desired format of returned Hanyu Pinyin string
   * @return Return the first Hanyu Pinyin of given Chinese character; return
   *         null if the input is not a Chinese character
   * 
   * @deprecated DO NOT use it again because the first retrived pinyin string
   *             may be a wrong pronouciation in a certain sentence context.
   *             <b> This function will be removed in next release. </b>
   */
  static private String getFirstHanyuPinyinString(char ch, HanyuPinyinOutputFormat outputFormat)
      throws BadHanyuPinyinOutputFormatCombination {
    String[] pinyinStrArray = getFormattedHanyuPinyinStringArray(ch, outputFormat);

    if ((null != pinyinStrArray) && (pinyinStrArray.length > 0)) {
      return pinyinStrArray[0];
    } else {
      return null;
    }
  }

  // ! Hidden constructor
  private PinyinHelper() {}
}

三、jpinyin

        JPinyin是一个汉字转拼音的Java开源类库,在PinYin4j的功能基础上做了一些改进。

3.1 JPinyin主要特性

        1、准确、完善的字库;Unicode编码从4E00-9FA5范围及3007(〇)的20903个汉字中,JPinyin能转换除46个异体字(异体字不存在标准拼音)之外的所有汉字;

        2、拼音转换速度快;经测试,转换Unicode编码从4E00-9FA5范围的20902个汉字,JPinyin耗时约100毫秒。

        3、多拼音格式输出支持;JPinyin支持多种拼音输出格式:带音标、不带音标、数字表示音标以及拼音首字母输出格式;

        4、常见多音字识别;JPinyin支持常见多音字的识别,其中包括词组、成语、地名等;

        5、简繁体中文转换;

        6、支持添加用户自定义字典;

3.2 集成

        app/build.gradle

implementation 'com.github.stuxuhai:jpinyin:1.1.7'

3.3 使用及效果

        var content:String = binding!!.etContent.text.toString()
        //上海 重庆 重量 长安 长大
        if (!StringUtils.isEmpty(content)) {
            // shànghǎi chóngqìng zhòngliàng chángān chángdà
            PinyinHelper.convertToPinyinString(content, "", PinyinFormat.WITH_TONE_MARK)
            // shang4hai3 chong2qing4 zhong4liang4 chang2an1 chang2da4
            PinyinHelper.convertToPinyinString(content, "", PinyinFormat.WITH_TONE_NUMBER)
            // shanghai chongqing zhongliang changan changda
            PinyinHelper.convertToPinyinString(content, "", PinyinFormat.WITHOUT_TONE)
            // sh cq zl ca cd
            PinyinHelper.getShortPinyin(content)
        }

        重庆和重量识别到了 chóngqìng zhòngliàng。

        长安和长大识别异常 changan changda。

        这个识别效果就看产品评估一下能不能给过了?

四、ASCII码映射

        优点:使用这个方法只需要一个类就可以搞定了,方便简单,无需添加依赖。

        缺点:相对于之前介绍的两种开源库来说,这种方法没那么强大,只能获取一些简单的汉字拼音。

4.1 TextPinyinUtil工具类

package com.scc.pinyin

/**
 * TextPinyinUtil
 * @desc 汉字转拼音工具
 * @date 2024/8/19
 */
class TextPinyinUtil {
    private var buffer: StringBuilder? = null

    // 汉字转成ASCII码
    private fun getChsAscii(chs: String): Int {
        var asc = 0
        try {
            val bytes = chs.toByteArray(charset("gbk"))
            if (bytes == null || bytes.size > 2 || bytes.size <= 0) {
                throw RuntimeException("illegal resource string")
            }
            if (bytes.size == 1) {
                asc = bytes[0].toInt()
            }
            if (bytes.size == 2) {
                val hightByte = 256 + bytes[0]
                val lowByte = 256 + bytes[1]
                asc = 256 * hightByte + lowByte - 256 * 256
            }
        } catch (e: Exception) {
            println("ERROR:ChineseSpelling.class-getChsAscii(String chs)$e")
        }
        return asc
    }

    // 单字解析
    fun convert(str: String): String? {
        var result: String? = null
        val ascii = getChsAscii(str)
        if (ascii > 0 && ascii < 160) {
            result = ascii.toChar().toString()
        } else {
            for (i in pyvalue.size - 1 downTo 0) {
                if (pyvalue[i] <= ascii) {
                    result = pystr[i]
                    break
                }
            }
        }
        return result
    }

    // 词组解析
    fun getPinyin(chs: String): String {
        var key: String
        var value: String?
        buffer = StringBuilder()
        for (i in 0 until chs.length) {
            key = chs.substring(i, i + 1)
            // 判断是否为汉字(汉字为两个以上字符)
            if (key.toByteArray().size >= 2) {
                value = convert(key)
                if (value == null) {
                    value = "unknown"
                }
            } else {
                value = key
            }
            buffer!!.append(value)
        }
        return buffer.toString()
    }

    companion object {
        private val pyvalue = intArrayOf(
            -20319,
            -20317,
            -20304,
            -20295,
            -20292,
            -20283,
            -20265,
            -20257,
            -20242,
            -20230,
            -20051,
            -20036,
            -20032,
            -20026,
            -20002,
            -19990,
            -19986,
            -19982,
            -19976,
            -19805,
            -19784,
            -19775,
            -19774,
            -19763,
            -19756,
            -19751,
            -19746,
            -19741,
            -19739,
            -19728,
            -19725,
            -19715,
            -19540,
            -19531,
            -19525,
            -19515,
            -19500,
            -19484,
            -19479,
            -19467,
            -19289,
            -19288,
            -19281,
            -19275,
            -19270,
            -19263,
            -19261,
            -19249,
            -19243,
            -19242,
            -19238,
            -19235,
            -19227,
            -19224,
            -19218,
            -19212,
            -19038,
            -19023,
            -19018,
            -19006,
            -19003,
            -18996,
            -18977,
            -18961,
            -18952,
            -18783,
            -18774,
            -18773,
            -18763,
            -18756,
            -18741,
            -18735,
            -18731,
            -18722,
            -18710,
            -18697,
            -18696,
            -18526,
            -18518,
            -18501,
            -18490,
            -18478,
            -18463,
            -18448,
            -18447,
            -18446,
            -18239,
            -18237,
            -18231,
            -18220,
            -18211,
            -18201,
            -18184,
            -18183,
            -18181,
            -18012,
            -17997,
            -17988,
            -17970,
            -17964,
            -17961,
            -17950,
            -17947,
            -17931,
            -17928,
            -17922,
            -17759,
            -17752,
            -17733,
            -17730,
            -17721,
            -17703,
            -17701,
            -17697,
            -17692,
            -17683,
            -17676,
            -17496,
            -17487,
            -17482,
            -17468,
            -17454,
            -17433,
            -17427,
            -17417,
            -17202,
            -17185,
            -16983,
            -16970,
            -16942,
            -16915,
            -16733,
            -16708,
            -16706,
            -16689,
            -16664,
            -16657,
            -16647,
            -16474,
            -16470,
            -16465,
            -16459,
            -16452,
            -16448,
            -16433,
            -16429,
            -16427,
            -16423,
            -16419,
            -16412,
            -16407,
            -16403,
            -16401,
            -16393,
            -16220,
            -16216,
            -16212,
            -16205,
            -16202,
            -16187,
            -16180,
            -16171,
            -16169,
            -16158,
            -16155,
            -15959,
            -15958,
            -15944,
            -15933,
            -15920,
            -15915,
            -15903,
            -15889,
            -15878,
            -15707,
            -15701,
            -15681,
            -15667,
            -15661,
            -15659,
            -15652,
            -15640,
            -15631,
            -15625,
            -15454,
            -15448,
            -15436,
            -15435,
            -15419,
            -15416,
            -15408,
            -15394,
            -15385,
            -15377,
            -15375,
            -15369,
            -15363,
            -15362,
            -15183,
            -15180,
            -15165,
            -15158,
            -15153,
            -15150,
            -15149,
            -15144,
            -15143,
            -15141,
            -15140,
            -15139,
            -15128,
            -15121,
            -15119,
            -15117,
            -15110,
            -15109,
            -14941,
            -14937,
            -14933,
            -14930,
            -14929,
            -14928,
            -14926,
            -14922,
            -14921,
            -14914,
            -14908,
            -14902,
            -14894,
            -14889,
            -14882,
            -14873,
            -14871,
            -14857,
            -14678,
            -14674,
            -14670,
            -14668,
            -14663,
            -14654,
            -14645,
            -14630,
            -14594,
            -14429,
            -14407,
            -14399,
            -14384,
            -14379,
            -14368,
            -14355,
            -14353,
            -14345,
            -14170,
            -14159,
            -14151,
            -14149,
            -14145,
            -14140,
            -14137,
            -14135,
            -14125,
            -14123,
            -14122,
            -14112,
            -14109,
            -14099,
            -14097,
            -14094,
            -14092,
            -14090,
            -14087,
            -14083,
            -13917,
            -13914,
            -13910,
            -13907,
            -13906,
            -13905,
            -13896,
            -13894,
            -13878,
            -13870,
            -13859,
            -13847,
            -13831,
            -13658,
            -13611,
            -13601,
            -13406,
            -13404,
            -13400,
            -13398,
            -13395,
            -13391,
            -13387,
            -13383,
            -13367,
            -13359,
            -13356,
            -13343,
            -13340,
            -13329,
            -13326,
            -13318,
            -13147,
            -13138,
            -13120,
            -13107,
            -13096,
            -13095,
            -13091,
            -13076,
            -13068,
            -13063,
            -13060,
            -12888,
            -12875,
            -12871,
            -12860,
            -12858,
            -12852,
            -12849,
            -12838,
            -12831,
            -12829,
            -12812,
            -12802,
            -12607,
            -12597,
            -12594,
            -12585,
            -12556,
            -12359,
            -12346,
            -12320,
            -12300,
            -12120,
            -12099,
            -12089,
            -12074,
            -12067,
            -12058,
            -12039,
            -11867,
            -11861,
            -11847,
            -11831,
            -11798,
            -11781,
            -11604,
            -11589,
            -11536,
            -11358,
            -11340,
            -11339,
            -11324,
            -11303,
            -11097,
            -11077,
            -11067,
            -11055,
            -11052,
            -11045,
            -11041,
            -11038,
            -11024,
            -11020,
            -11019,
            -11018,
            -11014,
            -10838,
            -10832,
            -10815,
            -10800,
            -10790,
            -10780,
            -10764,
            -10587,
            -10544,
            -10533,
            -10519,
            -10331,
            -10329,
            -10328,
            -10322,
            -10315,
            -10309,
            -10307,
            -10296,
            -10281,
            -10274,
            -10270,
            -10262,
            -10260,
            -10256,
            -10254
        )
        private val pystr = arrayOf(
            "a",
            "ai",
            "an",
            "ang",
            "ao",
            "ba",
            "bai",
            "ban",
            "bang",
            "bao",
            "bei",
            "ben",
            "beng",
            "bi",
            "bian",
            "biao",
            "bie",
            "bin",
            "bing",
            "bo",
            "bu",
            "ca",
            "cai",
            "can",
            "cang",
            "cao",
            "ce",
            "ceng",
            "cha",
            "chai",
            "chan",
            "chang",
            "chao",
            "che",
            "chen",
            "cheng",
            "chi",
            "chong",
            "chou",
            "chu",
            "chuai",
            "chuan",
            "chuang",
            "chui",
            "chun",
            "chuo",
            "ci",
            "cong",
            "cou",
            "cu",
            "cuan",
            "cui",
            "cun",
            "cuo",
            "da",
            "dai",
            "dan",
            "dang",
            "dao",
            "de",
            "deng",
            "di",
            "dian",
            "diao",
            "die",
            "ding",
            "diu",
            "dong",
            "dou",
            "du",
            "duan",
            "dui",
            "dun",
            "duo",
            "e",
            "en",
            "er",
            "fa",
            "fan",
            "fang",
            "fei",
            "fen",
            "feng",
            "fo",
            "fou",
            "fu",
            "ga",
            "gai",
            "gan",
            "gang",
            "gao",
            "ge",
            "gei",
            "gen",
            "geng",
            "gong",
            "gou",
            "gu",
            "gua",
            "guai",
            "guan",
            "guang",
            "gui",
            "gun",
            "guo",
            "ha",
            "hai",
            "han",
            "hang",
            "hao",
            "he",
            "hei",
            "hen",
            "heng",
            "hong",
            "hou",
            "hu",
            "hua",
            "huai",
            "huan",
            "huang",
            "hui",
            "hun",
            "huo",
            "ji",
            "jia",
            "jian",
            "jiang",
            "jiao",
            "jie",
            "jin",
            "jing",
            "jiong",
            "jiu",
            "ju",
            "juan",
            "jue",
            "jun",
            "ka",
            "kai",
            "kan",
            "kang",
            "kao",
            "ke",
            "ken",
            "keng",
            "kong",
            "kou",
            "ku",
            "kua",
            "kuai",
            "kuan",
            "kuang",
            "kui",
            "kun",
            "kuo",
            "la",
            "lai",
            "lan",
            "lang",
            "lao",
            "le",
            "lei",
            "leng",
            "li",
            "lia",
            "lian",
            "liang",
            "liao",
            "lie",
            "lin",
            "ling",
            "liu",
            "long",
            "lou",
            "lu",
            "lv",
            "luan",
            "lue",
            "lun",
            "luo",
            "ma",
            "mai",
            "man",
            "mang",
            "mao",
            "me",
            "mei",
            "men",
            "meng",
            "mi",
            "mian",
            "miao",
            "mie",
            "min",
            "ming",
            "miu",
            "mo",
            "mou",
            "mu",
            "na",
            "nai",
            "nan",
            "nang",
            "nao",
            "ne",
            "nei",
            "nen",
            "neng",
            "ni",
            "nian",
            "niang",
            "niao",
            "nie",
            "nin",
            "ning",
            "niu",
            "nong",
            "nu",
            "nv",
            "nuan",
            "nue",
            "nuo",
            "o",
            "ou",
            "pa",
            "pai",
            "pan",
            "pang",
            "pao",
            "pei",
            "pen",
            "peng",
            "pi",
            "pian",
            "piao",
            "pie",
            "pin",
            "ping",
            "po",
            "pu",
            "qi",
            "qia",
            "qian",
            "qiang",
            "qiao",
            "qie",
            "qin",
            "qing",
            "qiong",
            "qiu",
            "qu",
            "quan",
            "que",
            "qun",
            "ran",
            "rang",
            "rao",
            "re",
            "ren",
            "reng",
            "ri",
            "rong",
            "rou",
            "ru",
            "ruan",
            "rui",
            "run",
            "ruo",
            "sa",
            "sai",
            "san",
            "sang",
            "sao",
            "se",
            "sen",
            "seng",
            "sha",
            "shai",
            "shan",
            "shang",
            "shao",
            "she",
            "shen",
            "sheng",
            "shi",
            "shou",
            "shu",
            "shua",
            "shuai",
            "shuan",
            "shuang",
            "shui",
            "shun",
            "shuo",
            "si",
            "song",
            "sou",
            "su",
            "suan",
            "sui",
            "sun",
            "suo",
            "ta",
            "tai",
            "tan",
            "tang",
            "tao",
            "te",
            "teng",
            "ti",
            "tian",
            "tiao",
            "tie",
            "ting",
            "tong",
            "tou",
            "tu",
            "tuan",
            "tui",
            "tun",
            "tuo",
            "wa",
            "wai",
            "wan",
            "wang",
            "wei",
            "wen",
            "weng",
            "wo",
            "wu",
            "xi",
            "xia",
            "xian",
            "xiang",
            "xiao",
            "xie",
            "xin",
            "xing",
            "xiong",
            "xiu",
            "xu",
            "xuan",
            "xue",
            "xun",
            "ya",
            "yan",
            "yang",
            "yao",
            "ye",
            "yi",
            "yin",
            "ying",
            "yo",
            "yong",
            "you",
            "yu",
            "yuan",
            "yue",
            "yun",
            "za",
            "zai",
            "zan",
            "zang",
            "zao",
            "ze",
            "zei",
            "zen",
            "zeng",
            "zha",
            "zhai",
            "zhan",
            "zhang",
            "zhao",
            "zhe",
            "zhen",
            "zheng",
            "zhi",
            "zhong",
            "zhou",
            "zhu",
            "zhua",
            "zhuai",
            "zhuan",
            "zhuang",
            "zhui",
            "zhun",
            "zhuo",
            "zi",
            "zong",
            "zou",
            "zu",
            "zuan",
            "zui",
            "zun",
            "zuo"
        )
        val instance = TextPinyinUtil()

        /**
         * 判断是否为中文
         * @param text 内容
         * @return true=中文
         */
        fun isChinaString(text: String): Boolean {
            for (i in 0 until text.length) {
                val c = text[i]
                if (c.code >= 0x4e00 && c.code <= 0x9fbb) {
                    return true
                }
            }
            return false
        }
    }
}

4.2 使用及效果

//上海 重庆 重量 长安 长大

//shanghai zhongqing zhongliang changan changda
TextPinyinUtil.instance.getPinyin(content)

往期推荐

Android MediaRecorder 视频录制及报错解决-CSDN博客文章浏览阅读1.6k次,点赞43次,收藏37次。Android使用MediaRecorder类来录制视频。Android使用MediaRecorder类来录制视频模糊解决方案。MediaRecorder: start failed: -19。java.lang.RuntimeException: start failed.https://shuaici.blog.csdn.net/article/details/141216305Android SDK 遇到的坑之讯飞语音合成-CSDN博客文章浏览阅读1.4k次,点赞50次,收藏36次。loadLibrary msc error:java.lang.UnsatisfiedLinkError: dlopen failed: library "libmsc.so" not found组件未安装.(错误码:21002)https://shuaici.blog.csdn.net/article/details/141169429

  • 14
    点赞
  • 14
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 4
    评论
评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

帅次

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值