公共子串问题
在计算机科学中,最长公共子串问题是寻找两个或多个已知字符串最长的子串。此问题与最长公共子序列问题的区别在于子序列不必是连续的,而子串却必须是。链接: 百度百科
费曼算法
The Feynman Algorithm:
- Write down the problem.
- Think real hard.
- Write down the solution.
动态规划算法思路
-
给定两个字符串T1和T2, 假设dp(i, j)表示T1,T2在位置i, j的公共子串的长度【向位置0的方向,连续相同的字符的数量】
比如“abcdefg”, “absecde”
dp(0,0) = 1 ‘a’ == ‘a’
dp(1,1) = 2 ‘ab’ == ‘ab’
dp(1,2) = 0 ‘b’ != ‘s’ -
总结规律,最长公共子串:
if T1[i] == T2[j] :
dp(i,j) = dp(i-1,j-1) + 1
else:
dp(i,j) = 0
最长公共子序列:
if T1[i] == T2[j]:
dp[i,j] = dp(i-1,j-1) +1
else:
dp(i,j) = max(dp(i-1,j), dp(i,j-1))
代码实现
def getLcs(t1:str, t2:str)->str:
'''
get the longest common substring of two given string
'''
if not t1:
return t1
if not t2:
return t2
# dp = [[0 for i in range(len(t2)+1)] for j in range(len(t1)+1)]
max_len = 0
dp = [[0 for i in range(len(t2)+1)] for j in range(2)]
for i in range(1, len(t1)+1):
ind = [0,1][i%2==0]
for j in range(1, len(t2)+1):
if t1[i-1]==t2[j-1]:
dp[ind][j] = dp[1-ind][j-1] + 1
else:
dp[ind][j] = 0
if dp[ind][j] >max_len:
max_len = dp[ind][j]
p = i
print("max_len:%d , pos: %d"%(max_len,p))
return t1[p-max_len:p]
def getLcss(t1:str, t2:str)->str:
'''
get the longest common subsequence
'''
if not t1:
return t1
if not t2:
return t2
dp = [[0 for i in range(len(t2)+1)] for j in range(2)]
max_len = 0
res = ""
for i in range(1,len(t1)+1):
ind = [0,1][i%2]
for j in range(1, len(t2)+1):
if t1[i-1] == t2[j-1]:
dp[ind][j] = dp[1-ind][j-1] + 1
else:
dp[ind][j] = max(dp[ind][j-1], dp[1-ind][j])
if dp[ind][j] > max_len:
max_len = dp[ind][j]
res += t1[i-1]
# max_len = max(dp[0][-1], dp[1][-1])
print("max len: %d"%max_len)
return res