UVa 1392 DNA Regions 解题报告(二分)

56 篇文章 0 订阅

1392 - DNA Regions

Time limit: 3.000 seconds

A DNA sequence or genetic sequence is a succession of letters representing the primary structure of a real or hypothetical DNA molecule or strand, with the capacity to carry information. The possible letters are A, C, G, and T, representing the four nucleotide subunits of a DNA strand: adenine, cytosine, guanine and thymine bases covalently linked to phospho-backbone.

DNA sequences undergo mutations during the evolution of species, which means that some letters are randomly replaced with others. Therefore, the DNA sequences of two closely related species are very similar, and the difference increases as the distance between the species increases. The mutations do not occur with uniform frequency throughout the sequence; typically there are fewer mutations at the biologically important parts, since even a single mutation can be lethal at such a place. On the other hand, if a part of the sequence does not carry any biologically relevant information, then mutations on this part have no effect. It follows that if we compare the DNA sequences of two species and a particular region of the sequence contains fewer than the average number of mutations, then most probably this part of the sequence plays an important biological role. Therefore, it is of crucial importance to identify such regions. More precisely, a conserved region is a consecutive interval of the DNA sequence such that in this region at most p percent of the letters are different in the two sequences. Your task is to write a program that, given two DNA sequences, finds the longest conserved region.

Input 

The input contains several blocks of test cases. Each case begins with a line containing two integers: 1$ \le$n$ \le$150000, the length of the genetic sequences and 1$ \le$p$ \le$99, the maximum percentage of mutated letters allowed in a conserved region. This is followed by two lines, each containing a DNA sequence of length n. The sequence contains only the letters `A', `C', `G', and `T'.

The input is terminated by a test case with n = 0.

Output 

For each test case, you have to output a line containing a single integer: the length of the longest conserved region between the two sequences. If there are no conserved regions in the input, then output `No solution.' (without quotes).

Sample Input 

14 25
ACCGGTAACGTGAA
ACTGGATACGTAAA
14 24
ACCGGTAACGTGAA
ACTGGATACGTAAA
8 1
AAAAAAAA
CCCCCCCC
8 33
AAACAAAA
CCCCCCCC 
0 0

Sample Output 

8
7
No solution.
1

    解题报告: 刚看这题一直觉得是斜率优化之类的题目,但是一直搞不定。无奈看了解题报告……

    首先用sum[i]表示1-i位不同字符的个数。那么j+1到i段符合条件所要满足的不等式为:

    ( i - j ) * p >= ( sum[i] - sum[j] ) * 100 ... (1)

    移项,得

    i * p - sum[i] * 100 >= j * p - sum[j] * 100 ... (2)

    即

    sum[i] * 100 - i*p <= sum[j] * 100 - j * p ... (3)

    我们可以计算每一项的f(i) = sum[i] * 100 - i*p值。按照(3)式,我们所要求的应该是最小的j,且它的f(j)值大于等于当前的f(i)值。

    故此,我们可以建立一个单调上升的f(i)数组,每次查询时二分即可。代码如下:

#include <iostream>
#include <cstdio>
#include <algorithm>
#include <cstring>
#include <cmath>
#include <vector>
#include <queue>
#include <map>
#include <set>
#include <string>
using namespace std;

#define ff(i, n) for(int i=0;i<(n);i++)
#define fff(i, n, m) for(int i=(n);i<=(m);i++)
#define dff(i, n, m) for(int i=(n);i>=(m);i--)
typedef long long LL;
typedef unsigned long long ULL;
void work();

int main()
{
#ifdef ACM
    freopen("in.txt", "r", stdin);
//    freopen("in.txt", "w", stdout);
#endif // ACM

    work();
}

/*****************************************/

char a[222222];
char b[222222];
int arr[222222];
int idx[222222];

void work()
{
    int n, p;
    while(~scanf("%d%d", &n, &p) && (n||p))
    {
        scanf("%s%s", a+1, b+1);

        int tot = 0;
        int sum = 0;
        int ans = 0;

        arr[tot] = 0;
        idx[tot] = 0;
        tot++;

        fff(i, 1, n)
        {
            if(a[i] != b[i]) sum++;

            int val = sum*100 - i*p;
            if(val > arr[tot-1])
            {
                arr[tot] = val;
                idx[tot] = i;
                tot++;
            }

            int pos = lower_bound(arr, arr + tot, val) - arr;
            ans = max(ans, i-idx[pos]);
        }

        if(ans)
            printf("%d\n", ans);
        else
            puts("No solution.");
    }
}
    数学推导过程很巧妙,这也是我们需要学习的地方。

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值