Levenshtein distance
Levenshtein distance 简介
Levenshtein distance,最短编辑距离,计算2个字符串的最短距离,即变换次数,
规则为:
1.修改一个字符
2.增加一个字符
3.删除一个字符
任意一个操作,距离加1;
算法
定义一个矩阵,该矩阵matrix[i][j]代表着str1中的前i个字符到str2中的前j字符的最短距离。
衍生的计算规则:
1.matrix[i-1][j-1],如果str1的第i个字符等于str2的第j个字符,很显然有:
matrix[i][j] = matrix[i-1][j-1] + (str1[i]==str2[j] ? 0 : 1)
2.matrix[i][j-1],此时意味着str1的前i个字符串到str2的前j-1个字符串,那么添加一个字符即可,故有:
matrix[i][j] = matrix[i][j-1] +1
3.同理也有:
matrix[i][j] = matrix[i-1][j] +1
故状态转移方程:
matrix[i][j] = minimum(matrix[i-1][j-1] + (str1[i]==str2[j] ? 0 : 1), matrix[i][j-1] +1, matrix[i-1][j] +1)
java实现
public class EditDistance {
public static void main(String[] args) {
String str1 = "speech";
String str2 = "eechits";
int distance = getEditDistance(str1, str2);
System.out.print("最短距离: " + distance);
}
public static int getEditDistance(String str1, String str2) {
int n = str1.length();
int m = str2.length();
int[][] matrix;
char str1_i;
char str2_j;
if(n == 0) {
return m;
}
if(m == 0) {
return n;
}
matrix = new int[n+1][m+1];
for (int i=0; i <= n; i++) {
matrix[i][0] = i;
}
for (int j=0; j <= m; j++) {
matrix[0][j] = j;
}
for (int i=1; i <= n; i++) {
str1_i = str1.charAt(i - 1);
for (int j=1; j <= m; j++) {
str2_j = str2.charAt(j - 1);
matrix[i][j] = minimum(matrix[i][j-1] + 1, matrix[i-1][j] + 1, matrix[i-1][j-1] + ((str1_i == str2_j) ? 0 : 1));
}
}
print(matrix, n ,m);
return matrix[n][m];
}
public static int minimum(int a, int b, int c) {
int mid = a > b ? b : a;
return mid > c ? c : mid;
}
public static void print(int[][] matrix, int n, int m) {
for (int i=0; i<=n; i++) {
for (int j=0; j<=m; j++) {
System.out.print(matrix[i][j] + " ");
}
System.out.println();
}
}
}