蓝桥杯—DNA比对

版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。
本文链接:https://blog.csdn.net/qq_40339331/article/details/79974678
    脱氧核糖核酸即常说的DNA,是一类带有遗传信息的生物大分子。它由4种主要的脱氧核苷酸(dAMP、dGMP、dCMT和dTMP)通过磷酸二酯键连接而成。这4种核苷酸可以分别记为:A、G、C、T。

    DNA携带的遗传信息可以用形如:AGGTCGACTCCA.... 的串来表示。DNA在转录复制的过程中可能会发生随机的偏差,这才最终造就了生物的多样性。

    为了简化问题,我们假设,DNA在复制的时候可能出现的偏差是(理论上,对每个碱基被复制时,都可能出现偏差):

  1. 漏掉某个脱氧核苷酸。例如把 AGGT 复制成为:AGT

    2. 错码,例如把 AGGT 复制成了:AGCT

    3. 重码,例如把 AGGT 复制成了:AAGGT

    如果某DNA串a,最少要经过 n 次出错,才能变为DNA串b,则称这两个DNA串的距离为 n。

    例如:AGGTCATATTCC 与 CGGTCATATTC 的距离为 2

    你的任务是:编写程序,找到两个DNA串的距离

【输入、输出格式要求】

    用户先输入整数n(n<100),表示接下来有2n行数据。

    接下来输入的2n行每2行表示一组要比对的DNA。(每行数据长度<10000)

    程序则输出n行,表示这n组DNA的距离。

    例如:用户输入:
3
AGCTAAGGCCTT
AGCTAAGGCCT
AGCTAAGGCCTT
AGGCTAAGGCCTT
AGCTAAGGCCTT
AGCTTAAGGCTT

    则程序应输出:
1
1

2

思路:按照四种情况(相等 错码 漏码 重码)写出状态转移方程

代码:

#include<bits/stdc++.h>


using namespace std;
const int maxn = 1e4 + 50;


int n, dp[maxn][maxn];
string s1, s2;


int fun(int len1, int len2)
{
    int len = max(len1, len2);
    for(int i = 0;i <= len;i++)
    {
        dp[i][0] = dp[0][i] = i;
    }
    for(int i = 1;i <= len1;i++)
    {
        for(int j = 1;j <= len2;j++)
        {
            if(s1[i] == s2[j]) dp[i][j] = dp[i-1][j-1];//如果最后一个字母相同则不必改变
            else dp[i][j] = min(min(dp[i-1][j] + 1, dp[i][j-1] + 1), dp[i-1][j-1]+1);
            //不同的话分别按三种方法(漏码 重码 错码)改变一次
        }
    }
    return dp[len1][len2];
}


int main()
{
    while(cin >> n)
    {
        while(n--)
        {
            cin >> s1 >> s2;
            cout << fun(s1.size(), s2.size()) << endl;
        }
    }
    return 0;
}

展开阅读全文

DNA Sorting(DNA排序)

06-13

求大神用java搞定,一定要是java啊,谢啦。rnrn1 第一题rn DescriptionrnOne measure of ``unsortedness'' in a sequence is the number of pairs of entries that are out of order with respect to each other. For instance, in the letter sequence ``DAABEC'', this measure is 5, since D is greater than four letters to its right and E is greater than one letter to its right. This measure is called the number of inversions in the sequence. The sequence ``AACEDGG'' has only one inversion (E and D)---it is nearly sorted---while the sequence ``ZWQM'' has 6 inversions (it is as unsorted as can be---exactly the reverse of sorted). rnrnYou are responsible for cataloguing a sequence of DNA strings (sequences containing only the four letters A, C, G, and T). However, you want to catalog them, not in alphabetical order, but rather in order of ``sortedness'', from ``most sorted'' to ``least sorted''. All the strings are of the same length. (一个序列有一个度量的方法,比如“DAABEC”的度量就是5,首先D比它右边的四个字母大,而E比它右边的一个字母大,其余字母没有比自己右边字母大则为0,加起来就是5啦,本程序就是实现将多个字符串按这种度量方式从小排到大)rn InputrnThe first line contains two integers: a positive integer n (0 < n <= 50) giving the length of the strings; and a positive integer m (0 < m <= 100) giving the number of strings. These are followed by m lines, each containing a string of length n.rn OutputrnOutput the list of input strings, arranged from ``most sorted'' to ``least sorted''. Since two strings can be equally sorted, then output them according to the orginal order.rn Sample Inputrn10 6rnAACATGAAGGrnTTTTGGCCAArnTTTGGCCAAArnGATCAGATTTrnCCCGGGGGGArnATCGATGCATrnrnSample OutputrnrnCCCGGGGGGArnAACATGAAGGrnGATCAGATTTrnATCGATGCATrnTTTTGGCCAArnTTTGGCCAAArn2 第二题rn DescriptionrnAn ascending sorted sequence of distinct values is one in which some form of a less-than operator is used to order the elements from smallest to largest. For example, the sorted sequence A, B, C, D implies that A < B, B < C and C < D. in this problem, we will give you a set of relations of the form A < B and ask you to determine whether a sorted order has been specified or not. rn InputrnInput consists of multiple problem instances. Each instance starts with a line containing two positive integers n and m. the first value indicated the number of objects to sort, where 2 <= n <= 26. The objects to be sorted will be the first n characters of the uppercase alphabet. The second value m indicates the number of relations of the form A < B which will be given in this problem instance. Next will be m lines, each containing one such relation consisting of three characters: an uppercase letter, the character "<" and a second uppercase letter. No letter will be outside the range of the first n letters of the alphabet. Values of n = m = 0 indicate end of input.rn OutputrnFor each problem instance, output consists of one line. This line should be one of the following three: rnrnSorted sequence determined after xxx relations: yyy...y. rnSorted sequence cannot be determined. rnInconsistency found after xxx relations. rnrnwhere xxx is the number of relations processed at the time either a sorted sequence is determined or an inconsistency is found, whichever comes first, and yyy...y is the sorted, ascending sequence. (本程序要求根据给出的字母之间的大小关系,来进行排序,输出有三种形式:1、Sorted sequence determined after xxx relations: yyy...y. 表示在遍历了xxx个“<”号后,得到了字母的排序为yyy...y;2、Sorted sequence cannot be determined.表示根据给出的字母的大小关系无法得出字母的排列顺序;3、 Inconsistency found after xxx relations.表示给出的字母大小关系是冲突的,比如:A 论坛

DNA Sorting

04-24

One measure of ``unsortedness'' in a sequence is the number of pairs of entries that are out of order with respect to each other. For instance, in the letter sequence ``DAABEC'', this measure is 5, since D is greater than four letters to its right and E is greater than one letter to its right. This measure is called the number of inversions in the sequence. The sequence ``AACEDGG'' has only one inversion (E and D)--it is nearly sorted--while the sequence ``ZWQM'' has 6 inversions (it is as unsorted as can be--exactly the reverse of sorted).nYou are responsible for cataloguing a sequence of DNA strings (sequences containing only the four letters A, C, G, and T). However, you want to catalog them, not in alphabetical order, but rather in order of ``sortedness'', from ``most sorted'' to ``least sorted''. All the strings are of the same length.nnnThis problem contains multiple test cases!nnThe first line of a multiple input is an integer N, then a blank line followed by N input blocks. Each input block is in the format indicated in the problem description. There is a blank line between input blocks.nnThe output format consists of N output blocks. There is a blank line between output blocks.nnnInputnnThe first line contains two integers: a positive integer n (0 < n <= 50) giving the length of the strings; and a positive integer m (1 < m <= 100) giving the number of strings. These are followed by m lines, each containing a string of length n.nnnOutput nnOutput the list of input strings, arranged from ``most sorted'' to ``least sorted''. If two or more strings are equally sorted, list them in the same order they are in the input file.nnnSample Inputnn1nn10 6nAACATGAAGGnTTTTGGCCAAnTTTGGCCAAAnGATCAGATTTnCCCGGGGGGAnATCGATGCATnnnSample OutputnnCCCGGGGGGAnAACATGAAGGnGATCAGATTTnATCGATGCATnTTTTGGCCAAnTTTGGCCAAA 问答

DNA Translation

04-07

DescriptionnnDeoxyribonucleic acid (DNA) is composed of a sequence of nucleotide bases paired together to form a double-stranded helix structure. Through a series of complex biochemical processes the nucleotide sequences in an organism's DNA are translated into the proteins it requires for life. The object of this problem is to write a computer program which accepts a DNA strand and reports the protein generated, if any, from the DNA strand. nnThe nucleotide bases from which DNA is built are adenine, cytosine, guanine, and thymine (hereafter referred to as A, C, G, and T, respectively). These bases bond together in a chain to form half of a DNA strand. The other half of the DNA strand is a similar chain, but each nucleotide is replaced by its complementary base. The bases A and T are complementary, as are the bases C and G. These two "half-strands" of DNA are then bonded by the pairing of the complementary bases to form a strand of DNA. nnTypically a DNA strand is listed by simply writing down the bases which form the primary strand (the complementary strand can always be created by writing the complements of the bases in the primary strand). For example, the sequence TACTCGTAATTCACT represents a DNA strand whose complement would be ATGAGCATTAAGTGA. Note that A is always paired with T, and C is always paired with G. nnFrom a primary strand of DNA, a strand of ribonucleic acid (RNA) known as messenger RNA (mRNA for short) is produced in a process known as transcription. The transcribed mRNA is identical to the complementary DNA strand with the exception that thymine is replaced by a nucleotide known as uracil (hereafter referred to as U). For example, the mRNA strand for the DNA in the previous paragraph would be AUGAGCAUUAAGUGA. nnIt is the sequence of bases in the mRNA which determines the protein that will be synthesized. The bases in the mRNA can be viewed as a collection of codons, each codon having exactly three bases. The codon AUG marks the start of a protein sequence, and any of the codons UAA, UAG, or UGA marks the end of the sequence. The one or more codons between the start and termination codons represent the sequence of amino acids to be synthesized to form a protein. For example, the mRNA codon AGC corresponds to the amino acid serine (Ser), AUU corresponds to isoleucine (Ile), and AAG corresponds to lysine (Lys). So, the protein formed from the example mRNA in the previous paragraph is, in its abbreviated form, Ser-Ile-Lys. nnThe complete genetic code from which codons are translated into amino acids is shown in the table below (note that only the amino acid abbreviations are shown). It should also be noted that the sequence AUG, which has already been identified as the start sequence, can also correspond to the amino acid methionine (Met). So, the first AUG in a mRNA strand is the start sequence, but subsequent AUG codons are translated normally into the Met amino acid. nFirst base in codon Second base in codon Third base in codonnU C A GnU Phe Ser Tyr Cys UnPhe Ser Tyr Cys CnLeu Ser --- --- AnLeu Ser --- Trp GnC Leu Pro His Arg UnLeu Pro His Arg CnLeu Pro Gln Arg AnLeu Pro Gln Arg GnA Ile Thr Asn Ser UnIle Thr Asn Ser CnIle Thr Lys Arg AnMet Thr Lys Arg GnG Val Ala Asp Gly UnVal Ala Asp Gly CnVal Ala Glu Gly AnVal Ala Glu Gly GnInputnnThe input for this program consists of strands of DNA sequences, one strand per line, from which the protein it generates, if any, should be determined and output. The given DNA strand may be either the primary or the complementary DNA strand, and it may appear in either forward or reverse order, and the start and termination sequences do not necessarily appear at the ends of the strand. For example, a given input DNA strand to form the protein Ser-Ile-Lys could be any of ATACTCGTAATTCACTCC, CCTCACTTAATGCTCATA, TATGAGCATTAAGTGAGG, or GGAGTGAATTACGAGTAT. The input will be terminated by a line containing a single asterisk character.nOutputnnYou may assume the input to contain only valid, upper-case, DNA nucleotide base letters (A, C, G, and T). No input line will exceed 255 characters in length. There will be no blank lines or spaces in the input. Some sequences, though valid DNA strands, do not produce valid protein sequences; the string "*** No translatable DNA found ***" should be output when an input DNA strand does not translate into a valid protein.nSample InputnnATACTCGTAATTCACTCCnCACCTGTACACAGAGGTAACTTAGnTTAATACGACATAATTATnGCCTTGATATGGAGAACTCATTAGATAnAAGTGTATGTTGAATTATATAAAACGGGCATGAnATGATGATGGCTTGAn*nSample OutputnnSer-Ile-LysnCys-Leu-HisnSer-Tyrn*** No translatable DNA found ***nLeu-Asn-Tyr-Ile-Lys-Arg-AlanMet-Met-Ala 问答

没有更多推荐了,返回首页