levenshtein距离_动态编程面试问题levenshtein距离

levenshtein距离

It’s sad to say, but this series is finally coming to an end. I decided to end it because I don’t want to drag it on for too long and also because I think seven is a great number to end with. Lucky seven, you know?

这是痛心地说,但这个系列终于即将结束。 我决定结束它,是因为我不想拖太长时间,也因为我认为以7结尾的数字很大。 幸运七,你知道吗?

OK, so for today’s problem, we’ll be looking at computing the Levenshtein distance (aka edit distance) between two words.

好的,因此对于今天的问题,我们将研究计算两个单词之间的Levenshtein距离(也称为编辑距离)。

“The Levenshtein distance (a.k.a. edit distance) between two words is the minimum number of single-character edits (insertions, deletions or substitutions) required to change one word into the other.” — Wikipedia

“两个单词之间的Levenshtein距离(即编辑距离)是将一个单词转换为另一个单词所需的最小单字符编辑(插入,删除或替换)次数。” — 维基百科

问题 (The Problem)

Note: This problem comes from LeetCode.

注意:此问题来自LeetCode

Given two words word1 and word2, find the minimum number of operations required to convert word1 to word2.

给定两个单词word1和word2,找到将word1转换为word2所需的最小操作数。

You have the following 3 operations permitted on a word:

一个单词允许进行以下3种操作:

Insert a character

插入一个字符

Delete a character

删除角色

Replace a character

替换字符

Example 1:

范例1:

Input: word1 = “horse”, word2 = “ros”

输入:word1 =“ horse”,word2 =“ ros”

Output: 3

输出3

Explanation:

说明:

horse -> rorse (replace ‘h’ with ‘r’)

马-> rorse(用“ r”替换“ h”)

rorse -> rose (remove ‘r’)

rorse->玫瑰(删除“ r”)

rose -> ros (remove ‘e’)

玫瑰->罗斯(删除“ E”)

Example 2:

范例2:

Input: word1 = “intention”, word2 = “execution”

输入:word1 =“意图”,word2 =“执行”

Output: 5

输出:5

Explanation:

说明:

intention -> inention (remove ‘t’)

意图->意图(删除“ t”)

inention -> enention (replace ‘i’ with ‘e’)

inention-> enention(用“ e”替换“ i”)

enention -> exention (replace ’n’ with ‘x’)

enention-> exention(用“ x”替换“ n”)

exention -> exection (replace ’n’ with ‘c’)

exention-> exection(用“ c”替换“ n”)

exection -> execution (insert ‘u’)

执行->执行(插入'u')

Our goal seems straightforward, but as with most dynamic programming problems, building the foundation around the solution is the hardest.

我们的目标似乎很简单,但是与大多数动态编程问题一样,围绕解决方案建立基础是最困难的。

I should have mentioned this at the start of the series, but better late than never: Dynamic programming problems apply well to problems that have the optimal substructure property — the optimal solution to the problem can be found given optimal solutions to subproblems, and we see this same property in divide-and-conquer-type problems as well. The difference is that in DP, subproblems overlap such that one subproblem requires a solution from another subproblem, which makes memoization/tabulation great for reducing time complexity.

我应该在本系列的开头就提到这一点,但总比没有好过:动态编程问题很好地应用于具有最佳子结构属性的问题-可以通过为子问题提供最佳解决方案来找到该问题的最佳解决方案,在分而治之型问题中也具有相同的特性。 区别在于,在DP中,子问题重叠,因此一个子问题需要另一个子问题的解决方案,这使记忆/制表成为减少时间复杂性的好方法。

This just means that the key to solving any DP problem is to first try to break the main problem into a subproblem. Quite easily, we should see this:

这仅意味着解决任何DP问题的关键是首先尝试将主要问题分解为一个子问题。 很容易,我们应该看到以下内容:

  • Problem: Convert word1 to word2 by performing insert, delete, or replace operations on word1 (word2).

    问题:通过对word1(word2)执行插入,删除或替换操作,将word1转换为word2。
  • Subproblem: Convert word1[: i] to word2[: j] by performing insert, delete, or replace operations on word1.

    子问题:通过对word1执行插入,删除或替换操作,将word1 [:i]转换为word2 [:j]。

Now that we’ve broken down the problem, we’ll need to establish the base case(s).

现在,我们已经解决了问题,我们需要建立基本案例。

基本情况 (Base Case)

Either word1 or word2 is an empty string.

word1或word2是一个空字符串。

  • If word1 is an empty string, then we have to insert k number of characters into word1, where k = length of word2, so our cost here is k.

    如果word1是一个空字符串,那么我们必须在word1中插入k个字符,其中k = word2的长度,因此这里的成本为k

  • If word2 is an empty string, then we have to delete k number of characters from word1, where k = length of word1, so our cost here is also k.

    如果word2是一个空字符串,那么我们必须从word1中删除k个字符,其中k = word1的长度,因此我们的成本也是k

数据结构和列表 (Data Structure and Tabulation)

But now we need to store our base case somewhere, and what would be a good data structure to represent this? When comparing two things, generally a matrix does it well, right? One string can be on one axis and the other string can be on the second axis, and the value can be the Levenshtein distance between the two strings. For example, with HONDA and HYUNDAI, we have:

但是现在我们需要将基本案例存储在某个地方,那么代表这种情况的良好数据结构将是什么? 比较两件事时,通常矩阵会很好,对吧? 一个字符串可以在一个轴上,另一个字符串可以在第二个轴上,该值可以是两个字符串之间的Levenshtein距离。 例如,对于本田和现代,我们有:

Image for post
Cuelogic. Cuelogic

努力工作 (Working Our Way Up)

When working our way up, sometimes it can be useful to think about a recursive definition for our solution table. We know our base case, but what goes under our recursive case?

在逐步解决问题时,有时为解决方案表考虑递归定义可能会很有用。 我们知道基本情况,但是递归情况下会发生什么?

Note the value of n and m. By zero-indexing, this means we have a matrix of (n+1) * (m+1). This is because we want to include the case of empty strings (i.e. dist_table[i][0] signals an empty string instead of the first letter of word2, and similarly, dist_table[0][j] signals an empty string for word1 instead of its first letter).

注意nm的值。 通过零索引,这意味着我们具有(n + 1)*(m + 1)的矩阵。 这是因为我们要包括空字符串的情况(即dist_table[i][0]表示一个空字符串,而不是word2的第一个字母,类似地, dist_table[0][j]表示一个空字符串,表示word1)第一个字母)。

First, we need to ask ourselves when does it cost an operation to go from word1 to word2 and when does it not?

首先,我们需要问自己,从单词1到单词2的运算何时需要花费,什么时候不需要?

案例A:免费 (Case A: No cost)

This one is quite straightforward: The only way that no cost is required is when the letters are the same. By recursive definition, this means:

这很简单:唯一的不需要成本的方法就是字母相同。 通过递归定义,这意味着:

Note: Python slicing works so that word[:k] is a substring of word from index 0 to index k-1, so index k is not included.

注意:Python切片的工作原理是使 word[:k] 从索引0到索引k-1 word 的子字符串 ,因此不包括索引k。

# if the last letters are the same
if word1[n] == word[m]:
return lev_dist(word1[:n], word2[:m])

So we simply recurse on the substring, excluding the last index.

因此,我们仅对子字符串进行递归,不包括最后一个索引。

案例B:有成本时 (Case B: When there is a cost)

Obviously, there is a cost if the letters are different, so the opposite of Case A is essentially Case B. Let’s break it down further into what operations can be done when the letters are different and how exactly it affects our recursive definition.

显然,如果字母不同,则要付出代价,因此案例A的反面本质上是案例B。让我们进一步分解为字母不同时可以执行的操作,以及它如何精确影响我们的递归定义。

  • Insert: This increases the length of word1 by 1, as we insert the last letter of word2 into word1 such that word1 and word2 now have the same last letters. Now, we can recurse on these strings as we did in Case A by excluding the two letters that are the same, which will look like:

    插入:这将word1的长度增加1,因为我们将word2的最后一个字母插入到word1中,从而使word1和word2现在具有相同的最后一个字母。 现在,通过排除两个相同的字母,我们可以像案例A一样对这些字符串进行递归:
return lev_dist(word1, word2[:m]) + 1

Notice how we added 1 for an insert operation and that word1 essentially stays the same.

请注意,我们如何为插入操作加1 ,而word1本质上保持不变。

  • Delete: This reduces the length of word1 by 1, and we can simply recurse like so:

    删除:这将word1的长度减少了1,我们可以像这样简单地递归:
return lev_dist(word1[:n], word2) + 1
  • Replace: This replaces the last letter of word1 into the same letter as word2’s last letter so that Case A applies and we can recurse similarly:

    替换:这会将word1的最后一个字母替换为与word2的最后一个字母相同的字母,从而适用情况A,我们可以类似地递归:
return lev_dist(word1[:n], word[:m]) + 1

Putting this all together, we now have our final recursive definition:

综合所有这些,我们现在有了最终的递归定义:

All that’s left to do now is to convert this same idea into a bottom-up approach:

现在剩下要做的就是将相同的想法转换为自下而上的方法:

结论 (Conclusion)

This brings us to the end of the Dynamic Programming Series! Hopefully, you have learned a lot about how to approach DP problems and how to go about solving them so that you can ace your next interview!

这使我们到了动态编程系列的结尾! 希望您已经学到了很多关于如何解决DP问题以及如何解决它们的知识,以便您可以在下一次面试中获得成功!

Good luck and thanks for reading!

祝你好运,感谢您的阅读!

翻译自: https://medium.com/better-programming/dynamic-programming-interview-questions-levenshtein-distance-d415cb5e36ca

levenshtein距离

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值