1. 基本概念
最长公共子序列和最长公共子串不是同一回事
最长公共子序列(Longest Common Subsequence, LCS),顾名思义,是指在所有的子序列中最长的那一个。子串是要求更严格的一种子序列,要求在母串中连续地出现下面举例解释:
求解LCS问题,不能使用暴力搜索方法。一个长度为n的序列拥有2的n次方个子序列,它的时间复杂度是指数阶
2.【 动态规划】解【最长公共子序列】
求s1和s2得最长公共子序列:
- 若最后一个元素相等s1[i] == s2[j],那么S1和S2的LCS就等于 {S1减去最后一个元素} 与 {S2减去最后一个元素} 的 LCS + 1
- 若不相等,则等于 max{ s1去掉最后一个元素与s2的LCS, s2去掉最后一个元素与s1的LCS}
用二维数组c[i][j]记录串s1与s2的LCS长度,则可得到状态转移方程:
python实现:
# -*- coding:utf-8 -*-
# 最长公共子序列
import numpy as np
def lcs(s1, s2):
c = np.zeros((len(s1)+1, len(s2)+1))
for i in range(0, len(s1)+1):
for j in range(0, len(s2)+1):
if i == 0 or j == 0:
c[i][j] = 0
elif s1[i-1] == s2[j-1]:
c[i][j] = c[i-1][j-1] + 1
else:
c[i][j] = max(c[i][j-1], c[i-1][j])
return c[len(s1)][len(s2)], c
# s1 = [1, 3, 4, 5, 6, 7, 7, 8]
# s2 = [3, 5, 7, 4, 8, 6, 7, 8, 2]
s1 = 'ABCBDAB'
s2 = 'BDCABA'
# 最长公共子序列
max, c = lcs(s1, s2)
print(max)
print(c)
4.0
[[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 1. 1.]
[0. 1. 1. 1. 1. 2. 2.]
[0. 1. 1. 2. 2. 2. 2.]
[0. 1. 1. 2. 2. 3. 3.]
[0. 1. 2. 2. 2. 3. 3.]
[0. 1. 2. 2. 3. 3. 4.]
[0. 1. 2. 2. 3. 4. 4.]]
3.【 动态规划】解【最长公共子串】
dp方程:
python实现:
# -*- coding:utf-8 -*-
# 最长公共子串
import numpy as np
def substring(s1, s2):
c = np.zeros((len(s1)+1, len(s2)+1))
max = 0
for i in range(0, len(s1)+1):
for j in range(0, len(s2)+1):
if i == 0 or j == 0:
c[i][j] = 0
elif s1[i-1] == s2[j-1]:
c[i][j] = c[i-1][j-1] + 1
else:
c[i][j] = 0
if c[i][j] > max:
max = c[i][j]
return max, c
max, c = substring(s1, s2)
print(max)
print(c)
2.0
[[0. 0. 0. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 1.]
[0. 1. 0. 0. 0. 2. 0.]
[0. 0. 0. 1. 0. 0. 0.]
[0. 1. 0. 0. 0. 1. 0.]
[0. 0. 2. 0. 0. 0. 0.]
[0. 0. 0. 0. 1. 0. 1.]
[0. 1. 0. 0. 0. 2. 0.]]