uva1392 - DNA Regions 维护递减数列 二分

A DNA sequence or genetic sequence is a succession of letters representing the primary structure of a real or hypothetical DNA molecule or strand, with the capacity to carry information. The possible letters are A, C, G, and T, representing the four nucleotide subunits of a DNA strand: adenine, cytosine, guanine and thymine bases covalently linked to phospho-backbone.

DNA sequences undergo mutations during the evolution of species, which means that some letters are randomly replaced with others. Therefore, the DNA sequences of two closely related species are very similar, and the difference increases as the distance between the species increases. The mutations do not occur with uniform frequency throughout the sequence; typically there are fewer mutations at the biologically important parts, since even a single mutation can be lethal at such a place. On the other hand, if a part of the sequence does not carry any biologically relevant information, then mutations on this part have no effect. It follows that if we compare the DNA sequences of two species and a particular region of the sequence contains fewer than the average number of mutations, then most probably this part of the sequence plays an important biological role. Therefore, it is of crucial importance to identify such regions. More precisely, aconserved region is a consecutive interval of the DNA sequence such that in this region at mostp percent of the letters are different in the two sequences. Your task is to write a program that, given two DNA sequences, finds the longest conserved region.

Input 

The input contains several blocks of test cases. Each case begins with a line containing two integers:1$ \le$n$ \le$150000 , the length of the genetic sequences and1$ \le$p$ \le$99 , the maximum percentage of mutated letters allowed in a conserved region. This is followed by two lines, each containing a DNA sequence of lengthn . The sequence contains only the letters `A', `C', `G', and `T'.

The input is terminated by a test case with n = 0 .

Output 

For each test case, you have to output a line containing a single integer: the length of the longest conserved region between the two sequences. If there are no conserved regions in the input, then output `No solution.' (without quotes).

Sample Input 

14 25
ACCGGTAACGTGAA
ACTGGATACGTAAA
14 24
ACCGGTAACGTGAA
ACTGGATACGTAAA
8 1
AAAAAAAA
CCCCCCCC
8 33
AAACAAAA
CCCCCCCC 
0 0

Sample Output 

8
7
No solution.
1

  找突变率不超过p%的最长的序列长度。

  这个和也是DNA单调队列那个题目挺像的,都可以转化为斜率问题,但是做法不一样。设sum[i]为前i个的突变数,这个是要找(sum[b]-sum[a])/(b-a)<=p/100的最小的一个a,也就是b*p-100*sum[b]>=a*p-100*sum[a]的最小a。设sum[i].key=i*p-100*sum[i],从左往右扫描,用c数组存key值的一个递减数列,如果当前的key小于c数组里最后一个值,说明前面找不到比它更小的了,就把这个key加入c数组。反之就用二分法在c数组中找到下界,也就是位置尽量前的满足条件的,并且这个key不用加入c数组,因为c数组里已经有小于等于它的了,找前面那个更优。

  注意设c[0]=0,因为若i*p-100*sum[i]>=0,也就是sum[i]/i<=p/100,说明从0到i这个序列满足条件。

#include<cstdio>
#include<algorithm>
#include<iostream>
#include<sstream>
#include<cstring>
#include<cmath>
#include<queue>
#include<map>
#include<set>
#define INF 0x3f3f3f3f
#define MAXN 150010
#define MAXM 1010
#define eps 1e-9
#define pi 4*atan(1.0)
#define pii pair<int,int>
using namespace std;
int N,P;
char a[MAXN],b[MAXN];
int c[MAXN];
struct DNA{
    int id,key,sum;
}d[MAXN];
int bsearch(int L,int R,int v){
    int m;
    while(L<R){
        m=(L+R)/2;
        if(d[c[m]].key<=v) R=m;
        else L=m+1;
    }
    return L;
}
int main(){
    freopen("in.txt","r",stdin);
    while(scanf("%d%d",&N,&P),N||P){
        scanf("%s%s",a,b);
        d[0].id=d[0].sum=c[0]=0;
        int cnt=0,ans=0;
        for(int i=1;i<=N;i++){
            d[i].sum=d[i-1].sum+(a[i-1]!=b[i-1]);
            d[i].id=i;
            d[i].key=i*P-100*d[i].sum;
            if(d[i].key<d[c[cnt]].key) c[++cnt]=i;
            else{
                int s=bsearch(0,cnt,d[i].key);
                s=d[c[s]].id;
                ans=max(ans,i-s);
            }
        }
        if(!ans) printf("No solution.\n");
        else printf("%d\n",ans);
    }
    return 0;
}


  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值