DP5 编辑距离 Edit Distance @geeksforgeeks

Problem:Given two strings of size m, n and set of operations replace (R), insert (I) and delete (D) all at equal cost. Find minimum number of edits (operations) required to convert one string into another.

Identifying Recursive Methods:

What will be sub-problem in this case? Consider finding edit distance of part of the strings, say small prefix. Let us denote them as [1...i] and [1...j] for some 1< i < m and 1 < j < n. Clearly it is solving smaller instance of final problem,denote it as E(i, j). Our goal is finding E(m, n) and minimizing the cost.

In the prefix, we can right align the strings in three ways (i, -), (-, j) and (i, j). Thehyphensymbol (-) representing no character. An example can make it more clear.

Given strings SUNDAY and SATURDAY. We want to convert SUNDAY into SATURDAY with minimum edits. Let us pick i = 2 and j = 4 i.e. prefix strings are SUN and SATUrespectively (assume the strings indices start at 1). The right most characters can be aligned in three different ways.

Case 1:Align characters U and U. They are equal, no edit is required. We still left with the problem of i = 1 and j = 3, E(i-1, j-1).

Case 2:Align right character from first string and no character from second string. We need a deletion (D) here. We still left with problem of i = 1 and j = 4, E(i-1, j).

Case 3:Align right character from second string and no character from first string. We need an insertion (I) here. We still left with problem of i = 2 and j = 3, E(i, j-1).

Combining all the subproblems minimum cost of aligning prefix strings ending at i and j given by

E(i, j) = min([E(i-1, j) + D],[E(i, j-1) + I], [E(i-1, j-1) + R if i,j characters are not same])

We still not yet done. What will be base case(s)?

When both of the strings are of size 0, the cost is 0. When only one of the string is zero, we need edit operations as that of non-zero length string. Mathematically,

E(0, 0) = 0,E(i, 0) = i,E(0, j) = j

Now it is easy to complete recursive method. Go through the code for recursive algorithm (edit_distance_recursive).

Dynamic Programming Method:

We can calculate the complexity of recursive expression fairly easily.

T(m, n) = T(m-1, n-1) + T(m, n-1) + T(m-1, n) + C

The complexity of T(m, n) can be calculated by successivesubstitutionmethod or solving homogeneous equation of two variables. It will result in an exponential complexity algorithm. It is evident from the recursion tree that it will be solving subproblems again and again. Few strings result in many overlapping subproblems (try the below program with stringsexponentialandpolynomialand note the delay in recursive method).

We can tabulate the repeating subproblems and look them up when required next time (bottom up). A two dimensional array formed by the strings can keep track of the minimum cost till the current charactercomparison. The visualization code will help in understanding the construction of matrix.

The time complexity of dynamic programming method isO(mn)as we need to construct the table fully. The space complexity is alsoO(mn). If we need only the cost of edit, we just needO(min(m, n))space as it is required only to keep track of the current row and previous row.

Usually the costs D, I and R are not same. In such case the problem can be represented as an acyclicdirected graph (DAG) with weights on each edge, and finding shortest path gives edit distance.

Applications:

There are many practical applications of edit distance algorithm, referLuceneAPI for sample. Another example, display all the words in a dictionary that are near proximity to a given word\incorrectly spelled word.




package DP;

import java.util.Arrays;

/**
 * Edit Distance 
 * 
 * Given two words word1 and word2, find the minimum number of steps required to convert word1 to word2. (each operation is counted as 1 step.)

You have the following 3 operations permitted on a word:

a) Insert a character
b) Delete a character
c) Replace a character
 */
public class EditDistance {

	static int[][] dist = null;
	
	public static void main(String[] args) {
		String word1 = "sdfsssjdfhsb";
		String word2 = "cvdsfadfgkdfgj";
		
		dist = new int[word1.length()+1][word2.length()+1];
		for(int[] row : dist){
			Arrays.fill(row, -1);
		}
		
		System.out.println(minDistance(word1, word2));
		System.out.println(minDistance2(word1, word2));
	}
	
	public static int min3(int a, int b, int c){
		return Math.min(Math.min(a, b), c);
	}
	
	// DP bottom-up
	public static int minDistance(String word1, String word2) {
		int[][] distance = new int[word1.length()+1][word2.length()+1];
		
		// 边界情况:当其中一个string为空时,只要一直添加或删除就可以
		for(int i=0; i<=word1.length(); i++){
			distance[i][0] = i;
		}
		for(int j=1; j<=word2.length(); j++){
			distance[0][j] = j;
		}
		
		// 递推,[i][j]处可以由左,上,左上3种情况而来
		for(int i=1; i<=word1.length(); i++){
			for(int j=1; j<=word2.length(); j++){
				distance[i][j] = min3(distance[i-1][j]+1,	// 从上演变
											distance[i][j-1]+1,	// 从左演变
											distance[i-1][j-1]+(word1.charAt(i-1)==word2.charAt(j-1) ? 0 : 1));	// 从左上演变,考虑是否需要替换
			}
		}
		return distance[word1.length()][word2.length()];		// 返回右下角
    }

	// 递归,太慢
	public static int minDistance2(String word1, String word2) {
		return rec(word1, word1.length(), word2, word2.length());
//		return rec2(word1, word1.length(), word2, word2.length());
	}
	
	public static int rec(String word1, int len1, String word2, int len2){
		if(len1 == 0){
			return len2;
		}
		if(len2 == 0){
			return len1;
		}
		if(word1.charAt(len1-1) == word2.charAt(len2-1)){
			return rec(word1, len1-1, word2, len2-1);
		}else{
			return min3(rec(word1, len1-1, word2, len2-1) + 1, 
							rec(word1, len1, word2, len2-1) + 1, 
							rec(word1, len1-1, word2, len2) + 1);
		}
	}
	
	// 添加全局数组,保存状态,用空间换时间 DP top-down
	public static int rec2(String word1, int len1, String word2, int len2){
		if(len1 == 0){
			return len2;
		}
		if(len2 == 0){
			return len1;
		}
		
		if(word1.charAt(len1-1) == word2.charAt(len2-1)){
			if(dist[len1-1][len2-1] == -1){
				dist[len1-1][len2-1] = rec2(word1, len1-1, word2, len2-1);
			}
			return dist[len1-1][len2-1];
		}else{
			if(dist[len1-1][len2-1] == -1){
				dist[len1-1][len2-1] = rec2(word1, len1-1, word2, len2-1);
			}
			if(dist[len1][len2-1] == -1){
				dist[len1][len2-1] = rec2(word1, len1, word2, len2-1);
			}
			if(dist[len1-1][len2] == -1){
				dist[len1-1][len2] = rec2(word1, len1-1, word2, len2);
			}
			dist[len1][len2] = min3(dist[len1-1][len2-1]+1, dist[len1][len2-1]+1, dist[len1-1][len2]+1);
			return dist[len1][len2];
		}
	}
}


http://www.geeksforgeeks.org/dynamic-programming-set-5-edit-distance/
http://www.youtube.com/watch?v=CB425OsE4Fo


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值