UVA 题目11512 - GATTACA(后缀数组求出现次数最多的子串及重复次数)

The Institute of Bioinformatics and Medicine (IBM) of your country has been studying the DNA
sequences of several organisms, including the human one. Before analyzing the DNA of an organism,
the investigators must extract the DNA from the cells of the organism and decode it with a process
called “sequencing”.
A technique used to decode a DNA sequence is the “shotgun sequencing”. This technique is a
method applied to decode long DNA strands by cutting randomly many copies of the same strand to
generate smaller fragments, which are sequenced reading the DNA bases (A, C, G and T) with a special
machine, and re-assembled together using a special algorithm to build the entire sequence.
Normally, a DNA strand has many segments that repeat two or more times over the sequence (these
segments are called “repetitions”). The repetitions are not completely identified by the shotgun method
because the re-assembling process is not able to differentiate two identical fragments that are substrings
of two distinct repetitions.
The scientists of the institute decoded successfully the DNA sequences of numerous bacterias from
the same family, with other method of sequencing (much more expensive than the shotgun process)
that avoids the problem of repetitions. The biologists wonder if it was a waste of money the application
of the other method because they believe there is not any large repeated fragment in the DNA of the
bacterias of the family studied.
The biologists contacted you to write a program that, given a DNA strand, finds the largest substring
that is repeated two or more times in the sequence.
Input
The first line of the input contains an integer T specifying the number of test cases (1 ≤ T ≤ 100). Each
test case consists of a single line of text that represents a DNA sequence S of length n (1 ≤ n ≤ 1000).
You can suppose that each sequence S only contains the letters ‘A’, ‘C’, ‘G’ and ‘T’.
Output
For each sequence in the input, print a single line specifying the largest substring of S that appears two
or more times repeated in S, followed by a space, and the number of ocurrences of the substring in S.
If there are two or more substrings of maximal length that are repeated, you must choose the least
according to the lexicographic order.
If there is no repetition in S, print ‘No repetitions found!’.
Sample Input
6
GATTACA
GAGAGAG
GATTACAGATTACA
TGAC
TGTAC
TTGGAACC
Sample Output
A 3
GAGAG 2
GATTACA 2
No repetitions found!
T 2

A 2

ac代码

#include<stdio.h>           
#include<string.h>           
#include<algorithm>           
#include<iostream>          
#define min(a,b) (a>b?b:a)       
#define max(a,b) (a>b?a:b)    
#define N 1000005      
using namespace std;          
char str[1010];        
int sa[1010],Rank[1010],rank2[1010],height[1010],c[1010],*x,*y,s[1010],k; 
void cmp(int n,int sz)      
{      
    int i;      
    memset(c,0,sizeof(c));      
    for(i=0;i<n;i++)      
        c[x[y[i]]]++;      
    for(i=1;i<sz;i++)      
        c[i]+=c[i-1];      
    for(i=n-1;i>=0;i--)      
        sa[--c[x[y[i]]]]=y[i];      
}      
void build_sa(int *s,int n,int sz)      
{      
    x=Rank,y=rank2;      
    int i,j;      
    for(i=0;i<n;i++)      
        x[i]=s[i],y[i]=i;      
    cmp(n,sz);      
    int len;      
    for(len=1;len<n;len<<=1)      
    {      
        int yid=0;      
        for(i=n-len;i<n;i++)      
        {      
            y[yid++]=i;      
        }      
        for(i=0;i<n;i++)      
            if(sa[i]>=len)      
                y[yid++]=sa[i]-len;      
            cmp(n,sz);      
        swap(x,y);      
        x[sa[0]]=yid=0;      
        for(i=1;i<n;i++)      
        {      
            if(y[sa[i-1]]==y[sa[i]]&&sa[i-1]+len<n&&sa[i]+len<n&&y[sa[i-1]+len]==y[sa[i]+len])      
                x[sa[i]]=yid;      
            else      
                x[sa[i]]=++yid;      
        }      
        sz=yid+1;      
        if(sz>=n)      
            break;      
    }      
    for(i=0;i<n;i++)      
        Rank[i]=x[i];      
}      
void getHeight(int *s,int n)      
{      
    int k=0;      
    for(int i=0;i<n;i++)      
    {      
        if(Rank[i]==0)      
            continue;      
        k=max(0,k-1);      
        int j=sa[Rank[i]-1];      
        while(s[i+k]==s[j+k])      
            k++;      
        height[Rank[i]]=k;      
    }      
} 
int main()
{
	//int k;
	int t;
	scanf("%d",&t);
	while(t--)
	{
		int i,j;
		scanf("%s",str);
		int n=strlen(str);
		for(i=0;i<n;i++)
		{
			s[i]=str[i]-'A'+1;
		}
		s[n]=0;
		build_sa(s,n+1,26);
		getHeight(s,n);
		int ans=0;
		for(i=1;i<=n;i++)//保证<span id="transmark"></span>最小字典序
		{
			if(height[i]>ans)
				ans=height[i];
		}
		if(ans==0)
		{
			printf("No repetitions found!\n");
			continue;
		}
		for(i=1;i<=n;i++)
		{
			if(height[i]>=ans)
				break;
		}
		int k=1;
		for(j=i;j<=n&&height[j]>=ans;j++)
			k++;
		for(j=0;j<ans;j++)
		{
			printf("%c",str[sa[i]+j]);
		}
		printf(" %d\n",k);
	}
}


  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值