字符串DP

最新推荐文章于 2024-05-05 17:17:33 发布

小小小小葱

最新推荐文章于 2024-05-05 17:17:33 发布

阅读量973

点赞数

分类专栏： DP

本文链接：https://blog.csdn.net/corncsd/article/details/9567105

版权

DP 专栏收录该内容

74 篇文章 1 订阅

订阅专栏

这几天在做DP，总结一下，先是字符串这一类的。

最简单的就是给你两个字符串a和b，让你求最长公共子序列（LCS）。用dp[i][j]作为第一个字符串匹配到i位置和第二个字符串匹配到j位置时的最长公共子序列，所以如果a[i]==b[j]，dp[i][j]=dp[i-1][j-1]+1，否则dp[i][j]=max(dp[i-1][j],dp[i][j-1])。

之后一些题就和是这个的变形了，但是思路还是差不多的。

有一种是给你一个字符串，问最少在这个字符串中增加几个字母能使这个它变成回文串，比如Ab3bd至少要加2个变成dAb3bAd或Adb3bdA。这个其实和上面LCS是一样的，只要把a反着存一遍到b，设串长度为N，求出dp[N][N],然后用N-dp[N][N]结果（也可以不复制，直接倒着做，不过没这个简单）。

C - The Cow Lexicon

Time Limit:2000MS Memory Limit:65536KB 64bit IO Format:%I64d & %I64u

Submit Status

Description

Few know that the cows have their own dictionary with W (1 ≤ W ≤ 600) words, each containing no more 25 of the characters 'a'..'z'. Their cowmunication system, based on mooing, is not very accurate; sometimes they hear words that do not make any sense. For instance, Bessie once received a message that said "browndcodw". As it turns out, the intended message was "browncow" and the two letter "d"s were noise from other parts of the barnyard.

The cows want you to help them decipher a received message (also containing only characters in the range 'a'..'z') of length L (2 ≤ L ≤ 300) characters that is a bit garbled. In particular, they know that the message has some extra letters, and they want you to determine the smallest number of letters that must be removed to make the message a sequence of words from the dictionary.

Input

Line 1: Two space-separated integers, respectively: W and L
Line 2: L characters (followed by a newline, of course): the received message
Lines 3.. W+2: The cows' dictionary, one word per line

Output

Line 1: a single integer that is the smallest number of characters that need to be removed to make the message a sequence of dictionary words.

Sample Input

6 10
browndcodw
cow
milk
white
black
brown
farmer

Sample Output

这个是问最少删掉多少个字母能让剩下的字母能组成下面的单词。dp[i]代表从第1个到第i个字母需要删掉的最小字母。设串长度为N，做法是i从1循环到N，每次把下面的单词都尝试一遍，得到最优的dp[i]。

代码：

#include<stdio.h>
#include<string.h>
char a[610];
char s[610][35];
int dp[310];
int min(int a,int b)
{
    return a<b?a:b;
}
int main()
{
    int W,L;
    while(scanf("%d%d",&W,&L)!=EOF)
    {
        scanf("%s",a);
        int i,j,k;
        for(i=0; i<W; i++) scanf("%s",s[i]);
        dp[0]=0;
        for(i=1; i<=L; i++) dp[i]=99999;
        for(i=1; i<=L; i++)
        {
            for(j=0; j<W; j++)
            {
                int find=0,l;
                int p2=strlen(s[j])-1,p1=i-1;
                if(s[j][p2]==a[p1]&&p2<=p1)  //看前面的所有字母能否包含这个单词
                    while(p1>=0)
                    {
                        if(s[j][p2]==a[p1])
                        {
                            p1--;
                            p2--;
                        }
                        else p1--;
                        if(p2==-1)
                        {
                            find=1;
                            l=i-p1-1;
                            break;
                        }
                    }
                if(find) dp[i]=min(dp[i],dp[i-l]+l-strlen(s[j]));
                dp[i]=min(dp[i],dp[i-1]+1);
            }
        }
        printf("%d\n",dp[L]);
    }
}

H - Human Gene Functions

Time Limit:1000MS Memory Limit:10000KB 64bit IO Format:%I64d & %I64u

Submit Status

Description

It is well known that a human gene can be considered as a sequence, consisting of four nucleotides, which are simply denoted by four letters, A, C, G, and T. Biologists have been interested in identifying human genes and determining their functions, because these can be used to diagnose human diseases and to design new drugs for them.

A human gene can be identified through a series of time-consuming biological experiments, often with the help of computer programs. Once a sequence of a gene is obtained, the next job is to determine its function.
One of the methods for biologists to use in determining the function of a new gene sequence that they have just identified is to search a database with the new gene as a query. The database to be searched stores many gene sequences and their functions – many researchers have been submitting their genes and functions to the database and the database is freely accessible through the Internet.

A database search will return a list of gene sequences from the database that are similar to the query gene.
Biologists assume that sequence similarity often implies functional similarity. So, the function of the new gene might be one of the functions that the genes from the list have. To exactly determine which one is the right one another series of biological experiments will be needed.

Your job is to make a program that compares two genes and determines their similarity as explained below. Your program may be used as a part of the database search if you can provide an efficient one.
Given two genes AGTGATG and GTTAG, how similar are they? One of the methods to measure the similarity
of two genes is called alignment. In an alignment, spaces are inserted, if necessary, in appropriate positions of
the genes to make them equally long and score the resulting genes according to a scoring matrix.

For example, one space is inserted into AGTGATG to result in AGTGAT-G, and three spaces are inserted into GTTAG to result in –GT--TAG. A space is denoted by a minus sign (-). The two genes are now of equal
length. These two strings are aligned:

AGTGAT-G
-GT--TAG

In this alignment, there are four matches, namely, G in the second position, T in the third, T in the sixth, and G in the eighth. Each pair of aligned characters is assigned a score according to the following scoring matrix.

denotes that a space-space match is not allowed. The score of the alignment above is (-3)+5+5+(-2)+(-3)+5+(-3)+5=9.

Of course, many other alignments are possible. One is shown below (a different number of spaces are inserted into different positions):

AGTGATG
-GTTA-G

This alignment gives a score of (-3)+5+5+(-2)+5+(-1) +5=14. So, this one is better than the previous one. As a matter of fact, this one is optimal since no other alignment can have a higher score. So, it is said that the
similarity of the two genes is 14.

Input

The input consists of T test cases. The number of test cases ) (T is given in the first line of the input file. Each test case consists of two lines: each line contains an integer, the length of a gene, followed by a gene sequence. The length of each gene sequence is at least one and does not exceed 100.

Output

The output should print the similarity of each test case, one per line.

Sample Input

2 
7 AGTGATG 
5 GTTAG 
7 AGCTATT 
9 AGCTTTAAA

Sample Output

14
21

这个也是，一开始觉得好难，其实也就是那样。状态转移方程

dp[i][j]=max(dp[i-1][j-1]+p[a[i-1]][b[j-1]],dp[i][j-1]+p['-'][b[j-1]],dp[i-1][j]+p[a[i-1]]['-']);

代码：

#include<stdio.h>
#include<string.h>
char a[110],b[110];
int dp[110][110],p[120][120];
int max(int a,int b,int c)
{
    if(a>=b&&a>=c) return a;
    if(b>=a&&b>=c) return b;
    if(c>=a&&c>=b) return c;
}
int main()
{
    int T,N,M;
    scanf("%d",&T);
    memset(p,0,sizeof(p));
    p['A']['A']=5;
    p['A']['C']=p['C']['A']=-1;
    p['A']['G']=p['G']['A']=-2;
    p['A']['T']=p['T']['A']=-1;
    p['A']['-']=p['-']['A']=-3;
    p['C']['C']=5;
    p['C']['G']=p['G']['C']=-3;
    p['C']['T']=p['T']['C']=-2;
    p['C']['-']=p['-']['C']=-4;
    p['G']['G']=5;
    p['G']['T']=p['T']['G']=-2;
    p['G']['-']=p['-']['G']=-2;
    p['T']['T']=5;
    p['T']['-']=p['-']['T']=-1;
    while(T--)
    {
        scanf("%d%s%d%s",&N,a,&M,b);
        int i,j,m;
        dp[0][0]=0;
        for(i=1; i<=N; i++) dp[i][0]=dp[i-1][0]+p[a[i-1]]['-'];
        for(i=1; i<=M; i++) dp[0][i]=dp[0][i-1]+p['-'][b[i-1]];
        for(i=1; i<=N; i++)
            for(j=1; j<=M; j++)
                dp[i][j]=max(dp[i-1][j-1]+p[a[i-1]][b[j-1]],dp[i][j-1]+p['-'][b[j-1]],dp[i-1][j]+p[a[i-1]]['-']);
        printf("%d\n",dp[N][M]);

    }
}

K - Magic Number

Time Limit:1000MS Memory Limit:65536KB 64bit IO Format:%I64d & %I64u

Submit Status

Description

There are many magic numbers whose lengths are less than 10. Given some queries, each contains a single number, if the Levenshtein distance (see below) between the number in the query and a magic number is no more than a threshold, we call the magic number is the lucky number for that query. Could you find out how many luck numbers are there for each query?

Levenshtein distance (from Wikipedia http://en.wikipedia.org/wiki/Levenshtein_distance):
In information theory and computer science, the Levenshtein distance is a string metric for measuring the amount of difference between two sequences. The term edit distance is often used to refer specifically to Levenshtein distance.
The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other, with the allowable edit operations being insertion, deletion, or substitution of a single character. It is named after Vladimir Levenshtein, who considered this distance in 1965.
For example, the Levenshtein distance between "kitten" and "sitting" is 3, since the following three edits change one into the other, and there is no way to do it with fewer than three edits:
1.kitten → sitten (substitution of 's' for 'k')
2.sitten → sittin (substitution of 'i' for 'e')
3.sittin → sitting (insertion of 'g' at the end).

Input

There are several test cases. The first line contains a single number T shows that there are T cases. For each test case, there are 2 numbers in the first line: n (n <= 1500) m (m <= 1000) where n is the number of magic numbers and m is the number of queries.
In the next n lines, each line has a magic number. You can assume that each magic number is distinctive.
In the next m lines, each line has a query and a threshold. The length of each query is no more than 10 and the threshold is no more than 3.

Output

For each test case, the first line is "Case #id:", where id is the case number. Then output m lines. For each line, there is a number shows the answer of the corresponding query.

Sample Input

Sample Output

  
  
   
   Case #1:
1
0

问的是下面那些数能在给定的次数范围内通过增加、删除、改变字母变成上面的数字的个数。思路依然是一样的，用dp[i][j]存a到第i个字符和b到第j个字符时的结果，如果a[i-1]==b[j-1]，那么

dp[i][j]=min(dp[i-1][j-1],dp[i][j])

然后对于所有情况有(前2个是删除、增加，第3个是修改)

dp[i][j]=min(dp[i-1][j]+1,dp[i][j-1]+1,dp[i-1][j-1]+1);

代码：

#include<stdio.h>
#include<math.h>
#include<string.h>
int N=0;
int min(int a,int b)
{
    return a<b?a:b;
}
char a[1510][21],b[21];
int dp[21][21];
int main()
{
    int T;
    scanf("%d",&T);
    while(T--)
    {
        printf("Case #%d:\n",++N);
        int n,m,i,j,k,t;
        scanf("%d%d",&n,&m);
        for(i=0; i<n; i++) scanf("%s",a[i]);
        while(m--)
        {
            int ans=0,l1,l2;
            scanf("%s%d",b,&t);
            for(k=0; k<n; k++)
            {
                l1=strlen(a[k]);
                l2=strlen(b);
                if(abs(l1-l2)>t) continue;
                for(i=0; i<=l2; i++) dp[i][0]=i;
                for(i=0; i<=l1; i++) dp[0][i]=i;
                for(i=1; i<=l2; i++)
                    for(j=1; j<=l1; j++)
                    {
                        dp[i][j]=min(dp[i-1][j]+1,dp[i][j-1]+1);
                        if(a[k][j-1]==b[i-1]) dp[i][j]=min(dp[i-1][j-1],dp[i][j]);
                        else dp[i][j]=min(dp[i-1][j-1]+1,dp[i][j]);
                    }
                if(dp[l2][l1]<=t) ans++;
            }
            printf("%d\n",ans);
        }
    }
    return 0;
}

总之，字符串这类的DP经常出现几种情况。

1.两层循环，dp[i][j]代表一个字符串匹配到i位置另一个字符串匹配到j位置的结果，答案是dp[N][N]。

2.dp[i]表示从1到i的结果，答案是dp[N]。

3.dp[i][j]表示一个字符串位置i到位置j的结果，答案是dp[1][N]。

小小小小葱

关注

0
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
字符串DP

这几天在做DP，总结一下，先是字符串这一类的。最简单的就是给你两个字符串a和b，让你求最长公共子序列（LCS）。用dp[i][j]作为第一个字符串匹配到i位置和第二个字符串匹配到j位置时的最长公共子序列，所以如果a[i]==b[j]，dp[i][j]=dp[i-1][j-1]+1，否则dp[i][j]=max(dp[i-1][j],dp[i][j-1])。之后一些题就和是这个的变形
复制链接

扫一扫

专栏目录