中文转换为完整拼音算法原理分析

最近由于项目需要,对简体中文转拼音的算法作了一些了解,然而在google找到的大多是获得简体中文拼音首字母的算法,好不容易让我找到了一个sunrise.spell的类,专门用于中文转完整拼音,觉得的确做得不错,于是对它的算法作了一些分析,总的来说觉得还是比较简单的,拿出来与大家分享。

       我们先来学习一些准备知识。GB2312编码对于我们中国人是再熟悉不过了,我先简单的分析一下它的编码规则。GB2312编码包括符号、数字、字母、日文、制表符等,当然最主要的部分还是中文,它采用16位编码方式,简体中文的编码范围从B 0A 1一直到F7FE,完整编码表可以参考http://ash.jp/code/cn/gb2312tbl.htm。如果我们把该编码的每8位用十进制来表示就是[176 | 161][247 | 254],这样对于每个中文字符,我们都可以通过两个值来表示它,如“啊”就是[176 | 161],“我”则是[206 | 210]

通 过上面的方法,我们就可以通过一个二维坐标对每一个中文字进行定位,从而建立一个二维表来实现中文和拼音的对应关系。当然我们会忽略一些特殊情况,比如汉 字的多音字问题。由于一个拼音可能对应多个汉字,而拼音的组合本来就不多,因此我们首先建立一个拼音音节表,代码如下,里面列出了所有可能的组合情况,该 表是一维数组。

readonly   static   string [] _spellMusicCode  =   new   string []{
        
" a " " ai " " an " " ang " " ao " " ba " " bai " " ban " " bang " " bao " ,
        
" bei " " ben " " beng " " bi " " bian " " biao " " bie " " bin " " bing " " bo " ,
        
" bu " " ca " " cai " " can " " cang " " cao " " ce " " ceng " " cha " " chai " ,
        
" chan " " chang " " chao " " che " " chen " " cheng " " chi " " chong " " chou " " chu " ,
        
" chuai " " chuan " " chuang " " chui " " chun " " chuo " " ci " " cong " " cou " " cu " ,
        
" cuan " " cui " " cun " " cuo " " da " " dai " " dan " " dang " " dao " " de " ,
        
" deng " " di " " dian " " diao " " die " " ding " " diu " " dong " " dou " " du " ,
        
" duan " " dui " " dun " " duo " " e " " en " " er " " fa " " fan " " fang " ,
        
" fei " " fen " " feng " " fu " " fou " " ga " " gai " " gan " " gang " " gao " ,
        
" ge " " ji " " gen " " geng " " gong " " gou " " gu " " gua " " guai " " guan " ,
        
" guang " " gui " " gun " " guo " " ha " " hai " " han " " hang " " hao " " he " ,
        
" hei " " hen " " heng " " hong " " hou " " hu " " hua " " huai " " huan " " huang " ,
        
" hui " " hun " " huo " " jia " " jian " " jiang " " qiao " " jiao " " jie " " jin " ,
        
" jing " " jiong " " jiu " " ju " " juan " " jue " " jun " " ka " " kai " " kan " ,
        
" kang " " kao " " ke " " ken " " keng " " kong " " kou " " ku " " kua " " kuai " ,
        
" kuan " " kuang " " kui " " kun " " kuo " " la " " lai " " lan " " lang " " lao " ,
        
" le " " lei " " leng " " li " " lia " " lian " " liang " " liao " " lie " " lin " ,
        
" ling " " liu " " long " " lou " " lu " " luan " " lue " " lun " " luo " " ma " ,
        
" mai " " man " " mang " " mao " " me " " mei " " men " " meng " " mi " " mian " ,
        
" miao " " mie " " min " " ming " " miu " " mo " " mou " " mu " " na " " nai " ,
        
" nan " " nang " " nao " " ne " " nei " " nen " " neng " " ni " " nian " " niang " ,
        
" niao " " nie " " nin " " ning " " niu " " nong " " nu " " nuan " " nue " " yao " ,
        
" nuo " " o " " ou " " pa " " pai " " pan " " pang " " pao " " pei " " pen " ,
        
" peng " " pi " " pian " " piao " " pie " " pin " " ping " " po " " pou " " pu " ,
        
" qi " " qia " " qian " " qiang " " qie " " qin " " qing " " qiong " " qiu " " qu " ,
        
" quan " " que " " qun " " ran " " rang " " rao " " re " " ren " " reng " " ri " ,
        
" rong " " rou " " ru " " ruan " " rui " " run " " ruo " " sa " " sai " " san " ,
        
" sang " " sao " " se " " sen " " seng " " sha " " shai " " shan " " shang " " shao " ,
        
" she " " shen " " sheng " " shi " " shou " " shu " " shua " " shuai " " shuan " " shuang " ,
        
" shui " " shun " " shuo " " si " " song " " sou " " su " " suan " " sui " " sun " ,
        
" suo " " ta " " tai " " tan " " tang " " tao " " te " " teng " " ti " " tian " ,
        
" tiao " " tie " " ting " " tong " " tou " " tu " " tuan " " tui " " tun " " tuo " ,
        
" wa " " wai " " wan " " wang " " wei " " wen " " weng " " wo " " wu " " xi " ,
        
" xia " " xian " " xiang " " xiao " " xie " " xin " " xing " " xiong " " xiu " " xu " ,
        
" xuan " " xue " " xun " " ya " " yan " " yang " " ye " " yi " " yin " " ying " ,
        
" yo " " yong " " you " " yu " " yuan " " yue " " yun " " za " " zai " " zan " ,
        
" zang " " zao " " ze " " zei " " zen " " zeng " " zha " " zhai " " zhan " " zhang " ,
        
" zhao " " zhe " " zhen " " zheng " " zhi " " zhong " " zhou " " zhu " " zhua " " zhuai " ,
        
" zhuan " " zhuang " " zhui " " zhun " " zhuo " " zi " " zong " " zou " " zu " " zuan " ,
        
" zui " " zun " " zuo " "" " ei " " m " " n " " dia " " cen " " nou " ,
        
" jv " " qv " " xv " " lv " " nv "
        };
  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值