Longest Common Substring和Longest Common Subsequence是有区别的
X = <a, b, c, f, b, c>
Y = <a, b, f, c, a, b>
X和Y的Longest Common Sequence为<a, b, c, b>,长度为4
X和Y的Longest Common Substring为 <a, b>长度为2
其实Substring问题是Subsequence问题的特殊情况,也是要找两个递增的下标序列
<i1, i2, ...ik> 和 <j1, j2, ..., jk>使
xi1 == yj1
xi2 == yj2
......
xik == yjk
与Subsequence问题不同的是,Substring问题不光要求下标序列是递增的,还要求每次
递增的增量为1, 即两个下标序列为:
<i, i+1, i+2, ..., i+k-1> 和 <j, j+1, j+2, ..., j+k-1>
类比Subquence问题的动态规划解法,Substring也可以用动态规划解决,令
c[i][j]表示以X[i]和Y[i]结尾的公共子串的长度,如果X[i]不等于Y[i],则c[i][j]等于0, 比如
X = <y, e, d, f>
Y = <y, e, k, f>
c[1][1] = 1
c[2][2] = 2
c[3][3] = 0
c[4][4] = 1
动态转移方程为:
如果xi == yj, 则 c[i][j] = c[i-1][j-1]+1
如果xi ! = yj, 那么c[i][j] = 0
最后求Longest Common Substring的长度等于
max{c[i][j], 1<=i<=n, 1<=j<=m}
#include <stdio.h> #include <string.h> //#define DEBUG #ifdef DEBUG #define debug(...) printf( __VA_ARGS__) #else #define debug(...) #endif #define N 250 int c[N][N]; void print_str(char *s1, char *s2, int i, int j) { if (s1[i] == s2[j]) { print_str(s1, s2, i-1, j-1); putchar(s1[i]); } } int common_str(char *s1, char *s2) { int i, j, n, m, max_c; int x, y; n = strlen(s1); m = strlen(s2); max_c = -1; for (i = 1; i <= n; i++) { for (j = 1; j <= m; j++) { if (s1[i-1] == s2[j-1]) { c[i][j] = c[i-1][j-1] + 1; } else { c[i][j] = 0; } if (c[i][j] > max_c) { max_c = c[i][j]; x = i; y = j; } debug("c[%d][%d] = %d\n", i, j, c[i][j]); } } print_str(s1, s2, x-1, y-1); printf("\n"); return max_c; } int main() { char s1[N], s2[N]; while (scanf("%s%s", s1, s2) != EOF) { debug("%s %s\n", s1, s2); printf("%d\n", common_str(s1, s2)); } return 0; }