Time Limit: 1000MS | Memory Limit: 65536K | |
Total Submissions: 10082 | Accepted: 3890 |
Description
Let x and y be two strings over some finite alphabet A. We would like to transform x into y allowing only operations given below:
- Deletion: a letter in x is missing in y at a corresponding position.
- Insertion: a letter in y is missing in x at a corresponding position.
- Change: letters at corresponding positions are distinct
Certainly, we would like to minimize the number of all possible operations.
IllustrationA G T A A G T * A G G C | | | | | | | A G T * C * T G A C G CDeletion: * in the bottom line
Insertion: * in the top line
Change: when the letters at the top and bottom are distinct
This tells us that to transform x = AGTCTGACGC into y = AGTAAGTAGGC we would be required to perform 5 operations (2 changes, 2 deletions and 1 insertion). If we want to minimize the number operations, we should do it like
A G T A A G T A G G C | | | | | | | A G T C T G * A C G C
and 4 moves would be required (3 changes and 1 deletion).
In this problem we would always consider strings x and y to be fixed, such that the number of letters in x is m and the number of letters in y is n where n ≥ m.
Assign 1 as the cost of an operation performed. Otherwise, assign 0 if there is no operation performed.
Write a program that would minimize the number of possible operations to transform any string x into a string y.
Input
The input consists of the strings x and y prefixed by their respective lengths, which are within 1000.
Output
An integer representing the minimum number of possible operations to transform any string x into a string y.
Sample Input
10 AGTCTGACGC 11 AGTAAGTAGGC
Sample Output
4
大致题意:
给出两个字符串x 与 y,其中x的长度为m,y的长度为n,并且m<=n
然后y可以通过删除一个字母,添加一个字母,转换一个字母,三种操作得到x
问最少可以经过多少次操作。
其实这个是最短编辑距离的算法,算法实现采用动态规划算法,其求解过程类似于求两字符串的最长公共序列(LCS)。
编辑距离概念描述:
编辑距离,又称Levenshtein距离,是指两个字串之间,由一个转成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。
例如将kitten一字转成sitting:
- sitten (k→s)
- sittin (e→i)
- sitting (→g)
俄罗斯科学家Vladimir Levenshtein在1965年提出这个概念。
问题:找出字符串的编辑距离,即把一个字符串s1最少经过多少步操作变成编程字符串s2,操作有三种,添加一个字符,删除一个字符,修改一个字符
解析:
首先定义这样一个函数——edit(i, j),它表示第一个字符串的长度为i的子串到第二个字符串的长度为j的子串的编辑距离。
显然可以有如下动态规划公式:
- if i == 0 且 j == 0,edit(i, j) = 0
- if i == 0 且 j > 0,edit(i, j) = j
- if i > 0 且j == 0,edit(i, j) = i
- if i ≥ 1 且 j ≥ 1 ,edit(i, j) == min{ edit(i-1, j) + 1, edit(i, j-1) + 1, edit(i-1, j-1) + f(i, j) },当第一个字符串的第i个字符不等于第二个字符串的第j个字符时,f(i, j) = 1;否则,f(i, j) = 0。
类似于最长公共子串
用dp[i][j]表示a的前i个基因序列 变为 b的前j个基因序列 需要的最少操作数
则:
dp[i][j] = min
{
若 a[i] == b[j]
dp[i][j] = dp[i-1][j-1];
若 不等:
插入:dp[i][j] = dp[i][j-1] + 1;
删除: dp[i][j] = dp[i-1][j] + 1;
替换:dp[i][j] = dp[i-1][j-1] + 1;
}
注意初值:
dp[0][0] = 0;
dp[i][0] = i;
dp[0][i] = i;
#include
#include
#include
#include
using namespace std;
int m,n,dp[1010][1010];//dp[i][j]表示,a的前i个基因序列变为b的前j个基因序列需要的最少操作数
int main()
{
char a[1010],b[1010];
while(scanf("%d%s",&m,a)!=EOF)
{
memset(dp,0,sizeof(dp));
scanf("%d%s",&n,b);
for(int i=0;i<=m;i++)
{
dp[i][0]=i;
}
for(int j=0;j<=n;j++)
{
dp[0][j]=j;
}
for(int i=1;i<=m;i++)
{
for(int j=1;j<=n;j++)
{
if(a[i-1]==b[j-1])
{
dp[i][j]=min(min(dp[i-1][j-1],dp[i-1][j]+1),dp[i][j-1]+1);//始终等于最小的操作数
}
else
{
dp[i][j]=min(min(dp[i-1][j-1]+1,dp[i-1][j]+1),dp[i][j-1]+1);//始终等于最小的操作数
}
}
}
printf("%d\n",dp[m][n]);
}
return 0;
}