(1)编辑距离是测量一个字符串转换成另外一个字符串需要操作(操作包括: 插入 删除 置换)的最小次数。
编辑距离可以用来计算两字符串的相似度,另外也可以通过余弦方法来计算两字符串的相似度
(2)算法实现采用动态规划算法,其求解过程类似于求两字符串的最长公共序列(LCS)
下面是算法实现:
public class Distance
{
public static int getDistance(String s1, String s2)
{
int len1 = s1.length();
int len2 = s2.length();
int[][] d = new int[len1+1][len2+1];
int i=0, j=0;
for(i=0; i<=len1; i++)
d[i][0] = i;
for(j=0; j<=len2; j++)
d[0][j] = j;
for (i = 1; i < len1+1; i++)
for (j = 1; j < len2+1; j++)
{
int cost = 1;
if(s1.charAt(i-1) == s2.charAt(j-1))
{
cost = 0;
}
int delete = d[i - 1][j] + 1;
int insert = d[i][j - 1] + 1;
int substitution = d[i - 1][j - 1] + cost;
d[i][j] = min(delete, insert, substitution);
}
return (d[len1][len2]);
}
public static int min(int d,int i,int s)
{
int temp = 0;
if(d>i)
temp = i;
else
temp = d;
return s<temp?s:temp;
}
public static void main(String args[])
{
String s1= "kitten";
String s2 = "sitting";
System.out.println(Distance.getDistance(s1, s2));
}
}