A DNA sequenceconsists of four letters, A, C, G, and T. The GC-ratio of a DNA sequence is thenumber of Cs and Gs of the sequence divided by the length of the sequence.GC-ratio is important in gene finding because DNA sequences with relativelyhigh GC-ratios might be good candidates for the starting parts of genes. Givena very long DNA sequence, researchers are usually interested in locating asubsequence whose GC-ratio is maximum over all subsequences of the sequence.Since short subsequences with high GC-ratios are sometimes meaningless in genefinding, a length lower bound is given to ensure that a long subsequence withhigh GC-ratio could be found. If, in a DNA sequence, a 0 is assigned to every Aand T and a 1 to every C and G, the DNA sequence is transformed into a binarysequence of the same length. GC-ratios in the DNA sequence are now equivalentto averages in the binary sequence.
Position | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |||||||||
Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Sequence | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
For the binary sequence above, if the length lower bound is 7, the maximumaverage is 6/8 which happens in the subsequence [7,14]. Its length is 8, whichis greater than the length lower bound 7. If the length lower bound is 5, thenthe subsequence [7,11] gives the maximum average 4/5. The length is 5 which isequal to the length lower bound. For the subsequence [7,11], 7 is its startingindex and 11 is its ending index.
Given a binarysequence and a length lower bound L, write a program to find asubsequence of the binary sequence whose length is at least L andwhose average is maximum over all subsequences of the binary sequence. If twoor more subsequences have the maximum average, then find the shortest one; andif two or more shortest subsequences with the maximum average exist, then findthe one with the smallest starting index.
Your program is toread from standard input. The input consists of T test cases.The number of test cases T is given in the first line of theinput. Each test case starts with a line containing two integers n (1n100, 000) and L (1L1, 000) which are thelength of a binary sequence and a length lower bound, respectively. In the nextline, a string, binary sequence, of length n is given.
Your program is towrite to standard output. Print the starting and ending index of thesubsequence.
The followingshows sample input and output for two test cases.
2
17 5
00101011011011010
20 4
11100111100111110000
7 11
6 9
代码:
#include<cstdio>
using namespacestd;
const int maxn =100000 + 5;
int n, L;
char s[maxn];
int sum[maxn],p[maxn]; //average of i~j is(sum[j]-sum[i-1])/(j-i+1)
//compare average of x1~x2 and x3~x4
intcompare_average(int x1, int x2, int x3, int x4)
{
return (sum[x2]-sum[x1-1]) * (x4-x3+1) -(sum[x4]-sum[x3-1]) * (x2-x1+1);
}
int main()
{
int T;
scanf("%d", &T);
while(T--)
{
scanf("%d%d%s", &n,&L, s+1);
sum[0] = 0;
for(int i = 1; i <= n; i++)
{
sum[i] = sum[i-1] + s[i] - '0';
}
int ansL = 1, ansR = L;
//p[i..j) is thesequence of candidate start points
int i = 0, j = 0;
for (int t = L; t <= n; t++) // end point
{
while (j-i > 1 && compare_average(p[j-2],t-L, p[j-1], t-L) >= 0)
{
j--; // remove concave points
}
p[j++] = t-L+1; //new candidate
while (j-i > 1 &&compare_average(p[i], t, p[i+1], t) <= 0)
{
i++; // update tangent point
}
//compare andupdate solution
int c = compare_average(p[i], t,ansL, ansR);
if (c > 0 || c == 0 && t- p[i] < ansR - ansL)
{
ansL = p[i];
ansR = t;
}
}
printf("%d %d\n", ansL,ansR);
}
return 0;
}