T9算法

最新推荐文章于 2020-11-29 20:12:51 发布

hellolinshoujie

最新推荐文章于 2020-11-29 20:12:51 发布

阅读量8.1k

点赞数

文章标签：算法输入法 string idea java

android 联系人搜索算法的确有点小复杂，最近研究了好多时日，才将它实现。需要完整的javaT9算法的请联系我，qq：715216366 验证信息：湖北枣阳，注意哦，要收取些很小很小的费用以表共享。

T9输入法，名字听起来陌生，可是大家却经常使用它。可以说T9输入法是输入法历史中的一次革命。至少自T9输入法开始，输入法有长足的进步。
如图手机中九个数字键。26个英文字母被分配到2至9这8个数字键上。以前想输入英文单词的时候总是需要连续多次按某个键，才能得到目标字母。比如想输入“hello”，就需要按两次4，两次3，三次5，三次5，三次6.输入一个单词需要数十次的按键，更何况还有经常按错的情况。编辑一条短信非常麻烦。
T9输入法很好的解决了这一问题。用户利用9个数字键输入非常杂的英文单词并且不用重复按某个字符。系统将会根据已有词库找出可能性最大的单词。例如，目标单词为“hello”,只需输入4,3,5,5,6即可，系统会自动过滤掉不合法的单词如“gdjjm”。排除非法单词，只考虑合法单词，这样大大加速了输入速度。
然而这只是第一步，因为每次输入数字之后，可能有多个候选单词前缀与此匹配。这就需要按可能性提供可能性最大的（最常用）的英文单词。比如系统已知两个单词："idea","hello"。idea最常用（可能度最高）。那么当依次输入4,3,5,5,6时，对于每次按键系统给出的候选单词应该是：
i (4)
id (3)
hel (5)
hell (5)
hello (6)
这样的话，通过T9输入法就可以大大提高输入速度和简洁度。
如何实现T9输入法呢？
（1）建立词库
将大量单词储备起来便于快速查找，字典树无疑是很好的选择。每个节点将存储此前缀的可能度，结构如下：

[java] view plain copy print ?

/*
* 字典树
*/
public class Trie {
public Node root = new Node();//字典树根节点
public class Node{
public int probablity;
public Node[] next;
public Node() {
this.probablity = 0;
next = new Node[26];
}
}
public void insert(String str, int probablity){
Node p = root;
int i = 0;
for(i = 0; i < str.length(); i++){
if(p.next[str.charAt(i) - 'a'] != null){
p = p.next[str.charAt(i) - 'a'];
p.probablity += probablity;
} else {
Node q = new Node();
q.probablity = probablity;
p.next[str.charAt(i) - 'a'] = q;
p = p.next[str.charAt(i) - 'a'];
}
}
}
public int search(String str){
Node p = root;
int i = 0;
for(i = 0; i < str.length(); i++){
if(p.next[str.charAt(i) - 'a'] != null){
p = p.next[str.charAt(i) - 'a'];
} else {
return 0;
}
}
return p.probablity;
}
}

（2）查找
建立好的字典树，每个节点最多有26个孩子分别代表下一个字符。当给出一个查找串“43556”的时候，利用广度优先遍历来搜索长度x的所有前缀的可能度。找出最大的即可。广搜的时候要用到队列。
举例模拟过程：
当输入4（g,h,i）之后，通过字典树遍历发现只有"h","i"，找出可能读最大的输出（i），并将h,i都入队。
输入3(d,e,f)后，拿出队列中的"h",分别组成"hd","he","hf",去字典树中查，将其中合法串入队，记录最大可能度对应串。再将h出队。对"i"做同样操作。最终找出最大的可能读最大串。
……
可以看出广搜的过程中大量剪枝，以及常用单词的长度最多不过十几个字母。所以字典树的深度也是十几，加上大量剪枝，性能还不错。

[java] view plain copy print ?

static int[][] ref = { //手机键盘数字-字母映射表
{0},
{0},
{3, 0, 1, 2},//按钮2，三个字母a,b,c
{3, 3, 4, 5},
{3, 6, 7, 8},
{3, 9, 10, 11},
{3, 12, 13, 14},
{4, 15, 16, 17, 18},
{3, 19, 20, 21},
{4, 22, 23, 24, 25}
};
public static Queue<String> queue = new LinkedList<String>(); //用于广度优先遍历过程中的队列
public static Trie trie; //字典树
public static void BFS(String str){
queue.clear();
queue.offer("");
int i = 0;
for(i = 0; '1' != str.charAt(i); i++){
int max_probablity = 0; //最大可能性值
String probaStr = "";
int pre_amount = queue.size(); //队列中之前串的个数
int j = 0;
for(j = 0; j < pre_amount; j++){
int k = 0;
String preStr = queue.peek();
for(k = 1; k <= ref[str.charAt(i) - '0'][0]; k++){
String tempStr = preStr.concat((char)(ref[str.charAt(i) - '0'][k] + 97) + "");
if(trie.search(tempStr) > 0){
queue.offer(tempStr);
}
if(trie.search(tempStr) > max_probablity){
max_probablity = trie.search(tempStr);
probaStr = tempStr;
}
}
queue.poll();
}
if("".equals(probaStr)){
break;
} else {
System.out.println(probaStr);
}
}
while('1' != str.charAt(i++)){
System.out.println("MANUALLY");
}
}

static int[][] ref = {	//手机键盘数字-字母映射表
	{0},
	{0},
	{3, 0, 1, 2},//按钮2，三个字母a,b,c
	{3, 3, 4, 5},
	{3, 6, 7, 8},
	{3, 9, 10, 11},
	{3, 12, 13, 14},
	{4, 15, 16, 17, 18},
	{3, 19, 20, 21},
	{4, 22, 23, 24, 25}
};
public static Queue<String> queue = new LinkedList<String>();	//用于广度优先遍历过程中的队列
public static Trie trie;	//字典树
	
public static void BFS(String str){
	queue.clear();
	queue.offer("");
		
	int i = 0;
	for(i = 0; '1' != str.charAt(i); i++){
		int max_probablity = 0; //最大可能性值
		String probaStr = "";
		int pre_amount = queue.size(); //队列中之前串的个数
		int j = 0;
		for(j = 0; j < pre_amount; j++){
			int k = 0;
			String preStr = queue.peek();
			for(k = 1; k <= ref[str.charAt(i) - '0'][0]; k++){
				String tempStr = preStr.concat((char)(ref[str.charAt(i) - '0'][k] + 97) + "");
				if(trie.search(tempStr) > 0){
					queue.offer(tempStr);
				}
				if(trie.search(tempStr) > max_probablity){
					max_probablity = trie.search(tempStr);
					probaStr = tempStr; 
				}
					
			}
			queue.poll();
		}
		if("".equals(probaStr)){
			break;
		} else {
			System.out.println(probaStr);
		}
	}
	while('1' != str.charAt(i++)){
		System.out.println("MANUALLY");
	}
		
}

思考：
1.以上模拟过程只是给出最大可能度的串，也可以按可能度将所有匹配串按顺序都给出，然后由用户选择。
2.感觉T9输入法打破了之前输入法的僵局，虽然只是用在手机输入。但是个人认为当前流行的智能键盘输入法也采用了类似的处理。比如利用搜狗输入法依次输入"hao""zi""wei""zhi"
给出的最大候选依次为：“好”“耗子”“好滋味”“好自为之”。是不是有点T9输入法的味道？如果有兴趣的话可以交流讨论!

这个题就是实现T9输入法。T9在手机上很常见，是为了方便在数字键盘上输入英文字母而发明的输入法。打汉字一般都用拼音，而拼音都是用英文字母输入的，所以T9对提高中文输入效率也很有帮助。这个题要实现的，是T9的智能英文输入，要输入idea，就不必按3下4，1下3，2下3，1下2了，直接按4332就能打出来，确实就方便了不少

本题先输入一个单词表，包括单词以及该单词的权值。然后输入一些数字串，要求模拟手机输入的过程，每输入一个数字，就输出对应的单词（如果没有对应的就输出MANUALLY），如果输入的数字会对应不同的单词的前缀，就输出权值之和最高的前缀（如果权值一样就按字母表顺序）。用Sample来说明，输入了hell，hello，idea这3个单词，权值对应分别为3,4,8，开始输入数字：输入4，4可以对应i和h，i是idea的前缀，权值之和为8，h是hell和hello的前缀，权值之和是3+4=7，输出权值较大的i；继续输入3,43对应的可以是he和id，同样因为id的权值大于he，就输出id；接下来输入5,435就只能对应hel了……依此类推，每次输出的都是权值之和最高的词

分析：本题的关键就是要找出权值之和最高的单词前缀，这就会想到trie树了，这是解决字符串前缀问题的好办法，构造树的方法见我的代码。要注意的是，输入的是数字，而一个数字可以对应多个字母，需要有一个比较的过程，把权值最大的字母输出来。我这里用java的PriorityQueue优先队列来完成这个操作，优先的规则是层次从小到大，权值从大到小，字母表顺序

我一开始没有用优先队列，用的是贪心，每次选择权值最大的字母输出，结果WA了一次，是没考虑到每次只选最大的，会有可能在后面找到的并不是权值最大的前缀。在discuss里面看到一组数据：

3
abyb 3
abye 13
aaza 4
1
22921
才发现我错得跟他一样，输出的是
a
ab
aby
abyb
但正确的结果应该是
a
ab
aby
aaza

每个可能的结点都要扩展，并按照前面所述的原则加入到优先队列之中。使用BFS进行层次遍历。在优先队列中，每一层的第一个结点的字母就是符合要求的。我在结点类node中加入了一个属性father标记父结点，在遍历过程中获取每一层的第一个结点后，就利用father属性将结果反推出来然后输出

现在开始用java交题，觉得java还是蛮爽的，有这么多现成的类可以拿来用，eclipse的代码编辑能力也比dev c++,VC6之类的要强得多了。虽然执行效率没C/C++高，但能更快过题的java也不失为一个好语言。用gvim生成了一个分色的代码，就贴在下面了，内存4464K时间485MS

import java.util.Comparator;
import java.util.HashMap;
import java.util.PriorityQueue;
import java.util.Scanner;

class node {
char ch;// 字符
int val;// 权值
int lv;// 层次
node father;// 父结点
HashMap<Character, node> next;

node(char ch, int val, int lv, node father) {
this.ch = ch;
this.val = val;
this.lv = lv;
this.father = father;
next = new HashMap<Character, node>();
}

node() {
next = new HashMap<Character, node>();
lv = 0;
}
}

class trie {
private static char t9[][] = { {}, {}, {'a','b','c' }, {'d','e','f' }, {'g','h', 'i' }, {'j','k','l' }, {'m','n','o' }, {'p','q', 'r','s' }, {'t','u','v' }, {'w','x','y', 'z' } };

node root = null;

public void add(String str, int val) {
node curr = root;
for (int i =0; i < str.length(); i++) {
char ch = str.charAt(i);
// 没有结点就建立结点
node next;
if ((next = curr.next.get(ch)) ==null) {
curr.next.put(ch, new node(ch, val, curr.lv +1, curr));
curr = curr.next.get(ch);
} else {
curr = next;
curr.val += val;
}
}
}

public void find(String num_str) {
int length;
int num[];
// 找出1的位置，确定结束点
int one_pos = num_str.indexOf('1');
if (one_pos < 0) {
length = num_str.length();
} else {
length = one_pos;
}
// 字符转换为数字
num = new int[length];
for (int i =0; i < length; i++) {
num[i] = num_str.charAt(i) - '0';
}
// 建立优先队列
PriorityQueue<node> pq = new PriorityQueue<node>(26,new Comparator<node>() {
// 优先规则：层次从小到大，权值从大到小，字母表顺序
public int compare(node a, node b) {
if (a.lv == b.lv) {
if (a.val == b.val) {
return a.ch - b.ch;
} else {
return b.val - a.val;
}
} else {
return a.lv - b.lv;
}
}
});
// 把根加入队列
pq.add(root);
// 对trie树进行BFS遍历
int lv = 0;
while (pq.size() > 0) {
// 把所有可能接下来的结点列出，加入队列
node curr = pq.poll();
// 反推结果
if (curr.lv > lv) {
char res[] = new char[curr.lv];
node c = curr;
while (c != root) {
res[c.lv - 1] = c.ch;
c = c.father;
}
System.out.println(res);
lv++;
}
if (lv < num.length) {
int dig = num[curr.lv];
for (int i =0; i < t9[dig].length; i++) {
node next;
if ((next = curr.next.get(t9[dig][i])) !=null) {
pq.add(next);
}
}
}
}
// 填空
for (int i = lv; i < length; i++) {
System.out.println("MANUALLY");
}
}

trie() {
root = new node();
}
}

public class Main {
public static void main(String[] args)throws Exception {
Scanner cin = new Scanner(System.in);
int cas = cin.nextInt();
for (int th =1; th <= cas; th++) {
System.out.printf("Scenario #%d:\n", th);
// 对每一个测试用例建立一棵trie树
trie tr = new trie();
int w = cin.nextInt();
while (w-- > 0) {
tr.add(cin.next(), cin.nextInt());
}
int m = cin.nextInt();
while (m-- > 0) {
String src = cin.next();
tr.find(src);
System.out.println();
}
System.out.println();
}
}
}

/**

* 摘要：T9拼音输入法模块

* 描述：像手机键盘一样数字键和字母键复用，一个按键上有三或四个英文字母，

* 在输入拼音时选择需要的字母来组成拼音，T9只需输入该字母所在的按键一次，

* 程序自动组成合理的拼音，大大减少了输入时按键的次数，能够大大提高输入法的效率。

* 模块：char * t9PY_Get(char (*GetKey)(void))

* 传入读取键值函数，返回汉字首地址

* A&T：Lzy 2012-1-6

#include <string.h>

#include <stdio.h>

#include "py_mb.h"

#include "t9py_indexa.h"

#include "ty_config.h"

char chinese_word[3];

const struct t9PY_index * cpt9PY_Mb[16];//主要用于存放匹配的拼音码表地址，只有cpt9PY_Mb[0]存放的是一个不匹配的拼音码表地址.

//匹配并不是指相比较的字符长短一样，不是完全匹配.例如：34跟346是匹配的，34跟34是完全匹配(在这里我们不需要使用完全匹配)

//========================================================================

// 语法格式： char T9PY_Get_Match_PY_MB(char *p_PadInput,struct t9PY_index code ** List_match_PY_Mb)

// 实现功能：获取与输入相匹配的T9拼音列表

// 参数： p_PadInput 输入的按键序列，由'0'~'9'组成的字符串

// List_match_PY_Mb 存储"获取到的拼音索引地址"的数组

// 返回值：获取到完全匹配拼音数量

// 移植注意： List_match_PY_Mb所指向的存储结构,即用于存放匹配的拼音码表地址的存储结构

//========================================================================

char T9PY_Get_Match_PY_MB(char *p_PadInput,const struct t9PY_index **List_match_PY_Mb)

{

const struct t9PY_index *p_PY_CurrenIndex,*p_PY_LastIndex,*p_PY_TempIndex;

char i,j,cInputStrLength;

char T9PY_Match_NUM=0; //完全匹配拼音数量

j=0; //j为匹配最大值

if(*p_PadInput == '\0')

return(0); //如果输入空字符返回0//

cInputStrLength=strlen(p_PadInput); //获取输入拼音串长度//

p_PY_CurrenIndex = &(t9PY_index2[0]); //首索引地址赋值,p_PY_CurrenIndex为当前拼音索引地址

p_PY_LastIndex = t9PY_index2+sizeof(t9PY_index2)/sizeof(t9PY_index2[0]); //最后索引地址之后的地址（作下一语句比较用）

while(p_PY_CurrenIndex < p_PY_LastIndex) //遍历字母索引表.或者使用语句: while((p_PY_CurrenIndex->t9PY_T9[0])!='\n')

{

for(i=0;i<cInputStrLength;i++)

{

if(*(p_PadInput+i) != *((*p_PY_CurrenIndex).t9PY_T9+i)) //检查是否匹配

{

if (i+1 > j)

{

j=i+1; //j为不匹配字符串中前面字符匹配的最大值,例如:被比较字符串（键盘输入的字符串）为985,相比较的2个字符串（码表索引里面的字符串）为983和953，则j=2

p_PY_TempIndex=p_PY_CurrenIndex;

}

break; //不匹配,退出//

}

if((i==cInputStrLength) && (T9PY_Match_NUM<16)) //匹配,最多16组.匹配并不是指相比较的字符长短一样，不是完全匹配

{

List_match_PY_Mb[T9PY_Match_NUM]=p_PY_CurrenIndex;//

T9PY_Match_NUM++;

}

p_PY_CurrenIndex++;

}

if(j!=cInputStrLength) //不匹配但最多匹配字符的1组

List_match_PY_Mb[0]=p_PY_TempIndex;//

return (T9PY_Match_NUM); //输出完全匹配组数,0为无果而终//

}

/**

* 功能：获取汉字

* 输入：传入读取键值函数

* 返回：汉字保存的首地址

char * t9PY_Get(char (*GetKey)(void))

{

int PYEnter=0; /* 标识汉字是否已经输入 */

int HZok=0;

int flag = 1; /* 标记数组中汉字是否显示完毕 */

char cpt9PY_Mblen;

unsigned char temp, all;

unsigned char t9PYn = 0; /* 当前的第几个拼音 */

char in_line[16] = {0x00}; /* 存放依次从键盘输入的字符组成的字符串 */

char tempchar,Add=0,i=0;

const struct t9PY_index *cpTemp;

chinese_word[2]='\0';

while(!HZok)

{

tempchar = (*GetKey)(); /* 读取键值 */

switch (tempchar)

{

case key0:

case key1:

case key2:

case key3:

case key4:

case key5:

case key6:

case key7:

case key8:

case key9:

if (~PYEnter) //没有按下确定键,则字符存入in_line中,根据此字符串,获得节外匹配的拼音

{

in_line[i]=tempchar; /* 存入字符 */

i++;

Add = 0;

cpt9PY_Mblen = T9PY_Get_Match_PY_MB(in_line, cpt9PY_Mb);

if(cpt9PY_Mblen==0) /* 输入没有相应的拼音 */

goto del;

}

break;

case kleft: /* 前一拼音 */

if (t9PYn > 0)

t9PYn--;

break;

case kright: /* 后一拼音 */

t9PYn++;

if (t9PYn >= cpt9PY_Mblen) t9PYn --; /* 判断是否已到最后一个拼音 */

break;

case PageUp: /* 前翻页 */

if (Add >= 12) Add -= 12; /* 前6个汉字*/

break;

case PageDown: /* 后翻页 */

if (Add < strlen((*cpTemp).PY_mb) - 12)

Add += 12;

break;

case kdel: /* 删除前一个字筗 */

del: if (i>0)

{

i--;

in_line[i]=0x00;

Add=0;

cpt9PY_Mblen = T9PY_Get_Match_PY_MB(in_line, cpt9PY_Mb);

}

break;

case kenter: //输入状态和选字状态切换

PYEnter ^= 1;

break;

default :

break;

}

if (PYEnter)

{

cpTemp=cpt9PY_Mb[t9PYn]; /* 获得当前拼音 */

printf ("\n选字：");

tempchar = (*GetKey)(); /* 获得要选的数字 */

if((cpTemp != PY_mb_space) && (tempchar>='1') && (tempchar<='6'))

{

HZok=1; /* 改变循环条件，到这里汉字已选上 */

t9PYn=0; /* 清除拼音标记变量 */

/* 判断输入的数字在否大于汉字个数 */

while(strlen((*cpTemp).PY_mb) < strlen((*cpTemp).PY_mb+(Add/2)*2+(tempchar-'0')*2+1))

tempchar = (*GetKey)();

chinese_word[0]=*((*cpTemp).PY_mb+(Add/2)*2+(tempchar-'0')*2+1);

chinese_word[1]=*((*cpTemp).PY_mb+(Add/2)*2+(tempchar-'0')*2+2);

return (chinese_word); /* 返回获得的汉字*/

}

else

{

printf ("\n拼音: ");

for (temp=t9PYn;temp<cpt9PY_Mblen; temp++) /* 输出所有拼音*/

{

cpTemp=cpt9PY_Mb[temp];

printf ("%s ",(*(cpTemp)).PY);

}

cpTemp=cpt9PY_Mb[t9PYn];

printf ("\n汉字: ");

for(temp = 0; temp < 6; temp++) /* 显示当前拼音的基于页(Add)前6个汉字 */

{

//Add除于2后又乘于2的作用是为了获得偶数(因为拼音码表首字符是'@'而不是汉字)

chinese_word[0]=*((*cpTemp).PY_mb+(Add/2)*2+temp*2+1);

chinese_word[1]=*((*cpTemp).PY_mb+(Add/2)*2+temp*2+2);

/* 判断汉字是否已全部输出 */

if(chinese_word[0] == 0)

{

flag = 0;

break;

}

printf("%s ",chinese_word);

flag = 1;

}

if(flag) printf("-> "); /* 提示还有汉字没有显示完全 */

}

/********程序测试部分************/

/* 获取键值 */

char key(void)

{

int key;

scanf("%c",&key);

getchar(); /*获取回车符 */

printf("输入 %c", key);

return key;

}

int main(void)

{

/* char input_string[]="243";

int num, i;

num = T9PY_Get_Match_PY_MB(input_string, cpt9PY_Mb);

for(i = 0; i < num; i++)

printf("%s %s\n",(*(cpt9PY_Mb[i])).PY,(*(cpt9PY_Mb[i])).PY_mb); */

printf("%s\n",t9PY_Get(key)); /* 汉字是两个字符所以需要用字符串方式打印 */

}

//地址；http://blog.chinaunix.net/uid-24219701-id-3066063.html

hellolinshoujie

关注

0
点赞
踩
8

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫