CSU 1632 Repeated Substrings

原创 2016年08月28日 16:47:43

Description

String analysis often arises in applications from biology and chemistry, such as the study of DNA and protein molecules. One interesting problem is to find how many substrings are repeated (at least twice) in a long string. In this problem, you will write a program to find the total number of repeated substrings in a string of at most 100 000 alphabetic characters. Any unique substring that occurs more than once is counted. As an example, if the string is “aabaab”, there are 5 repeated substrings: “a”, “aa”, “aab”, “ab”, “b”. If the string is “aaaaa”, the repeated substrings are “a”, “aa”, “aaa”, “aaaa”. Note that repeated occurrences of a substring may overlap (e.g. “aaaa” in the second case).

Input

The input consists of at most 10 cases. The first line contains a positive integer, specifying the number of
cases to follow. Each of the following line contains a nonempty string of up to 100 000 alphabetic characters.

Output

For each line of input, output one line containing the number of unique substrings that are repeated. You
may assume that the correct answer fits in a signed 32-bit integer.

Sample Input

3
aabaab
aaaaa
AaAaA

Sample Output

5
4

5

求重复了两次或以上的子串的数量,根据h数组统计即可

#include<set>
#include<map>
#include<ctime>
#include<cmath>
#include<stack>
#include<queue>
#include<bitset>
#include<cstdio>
#include<string>
#include<cstring>
#include<iostream>
#include<algorithm>
#include<functional>
#define rep(i,j,k) for (int i = j; i <= k; i++)
#define per(i,j,k) for (int i = j; i >= k; i--)
#define loop(i,j,k) for (int i = j;i != -1; i = k[i])
#define lson x << 1, l, mid
#define rson x << 1 | 1, mid + 1, r
#define fi first
#define se second
#define mp(i,j) make_pair(i,j)
#define pii pair<string,string>
using namespace std;
typedef long long LL;
const int low(int x) { return x&-x; }
const double eps = 1e-8;
const int INF = 0x7FFFFFFF;
const int mod = 1e8;
const int N = 5e5 + 10;
const int read()
{
	char ch = getchar();
	while (ch<'0' || ch>'9') ch = getchar();
	int x = ch - '0';
	while ((ch = getchar()) >= '0'&&ch <= '9') x = x * 10 + ch - '0';
	return x;
}
int T;

struct Sa
{
	char s[N];
	int rk[2][N], sa[N], h[N], w[N], now, n;
	int rmq[N][20], lg[N];

	bool GetS()
	{
		scanf("%s", s + 1);
		return true;
	}

	void getsa(int z, int &m)
	{
		int x = now, y = now ^= 1;
		rep(i, 1, z) rk[y][i] = n - i + 1;
		for (int i = 1, j = z; i <= n; i++)
			if (sa[i] > z) rk[y][++j] = sa[i] - z;

		rep(i, 1, m) w[i] = 0;
		rep(i, 1, n) w[rk[x][rk[y][i]]]++;
		rep(i, 1, m) w[i] += w[i - 1];
		per(i, n, 1) sa[w[rk[x][rk[y][i]]]--] = rk[y][i];
		for (int i = m = 1; i <= n; i++)
		{
			int *a = rk[x] + sa[i], *b = rk[x] + sa[i - 1];
			rk[y][sa[i]] = *a == *b&&*(a + z) == *(b + z) ? m - 1 : m++;
		}
	}

	void getsa(int m)
	{
		n = strlen(s + 1);
		rk[1][0] = now = sa[0] = s[0] = 0;
		rep(i, 1, m) w[i] = 0;
		rep(i, 1, n) w[s[i]]++;
		rep(i, 1, m) rk[1][i] = rk[1][i - 1] + (bool)w[i];
		rep(i, 1, m) w[i] += w[i - 1];
		rep(i, 1, n) rk[0][i] = rk[1][s[i]];
		rep(i, 1, n) sa[w[s[i]]--] = i;

		rk[1][n + 1] = rk[0][n + 1] = 0;	//多组的时候容易出bug
		for (int x = 1, y = rk[1][m]; x <= n && y <= n; x <<= 1) getsa(x, y);
		for (int i = 1, j = 0; i <= n; h[rk[now][i++]] = j ? j-- : j)
		{
			if (rk[now][i] == 1) continue;
			int k = n - max(sa[rk[now][i] - 1], i);
			while (j <= k && s[sa[rk[now][i] - 1] + j] == s[i + j]) ++j;
		}
	}

	void getrmq()
	{
		h[n + 1] = h[1] = lg[1] = 0;
		rep(i, 2, n) rmq[i][0] = h[i], lg[i] = lg[i >> 1] + 1;
		for (int i = 1; (1 << i) <= n; i++)
		{
			rep(j, 2, n)
			{
				if (j + (1 << i) > n + 1) break;
				rmq[j][i] = min(rmq[j][i - 1], rmq[j + (1 << i - 1)][i - 1]);
			}
		}
	}

	int lcp(int x, int y)
	{
		int l = min(rk[now][x], rk[now][y]) + 1, r = max(rk[now][x], rk[now][y]);
		return min(rmq[l][lg[r - l + 1]], rmq[r - (1 << lg[r - l + 1]) + 1][lg[r - l + 1]]);
	}

	void work()
	{
		GetS();	getsa(256);
		int ans = 0;
		rep(i, 2, n) ans += max(0, h[i] - h[i - 1]);
		printf("%d\n", ans);
	}
}sa;

int main()
{
	T = read(); while (T--) sa.work();
	return 0;
}


版权声明:本文为博主原创文章,未经博主允许不得转载。

CSU 1632 Repeated Substrings(后缀数组)

Description 求字符串中所有出现至少2次的子串个数 Input 第一行为一整数T(T
  • V5ZSQ
  • V5ZSQ
  • 2016年05月03日 14:58
  • 287

CSU - 1632 Repeated Substrings 后缀数组、distinct substring

题意:给出一个字符串,找出出现2次及以上的子串的种数。 后缀数组 跑出sa数组和height数组,然后从i = 2 ~ n遍历,每次 ans += max(0, heigt[i] - lastheig...
  • ProLightsfxjh
  • ProLightsfxjh
  • 2017年02月22日 23:57
  • 330

CSU1632: Repeated Substrings(后缀数组)

Description String analysis often arises in applications from biology and chemistry, such as the s...
  • libin56842
  • libin56842
  • 2015年06月09日 20:54
  • 811

UVA1632_Alibaba

题目链接 大致题意:直线上面有n个点,第i个点坐标为xi,它会在di时间消失,你可以选择从任何位置出发,求访问完所有点的最短时间,无解输出no solution 思路:这有一个难点就是,不知道状态...
  • xfzero
  • xfzero
  • 2015年09月09日 16:57
  • 518

Count Binary Substrings问题及解法

696. Count Binary Substrings LeetCode
  • u011809767
  • u011809767
  • 2017年10月16日 15:44
  • 428

UVALive 6869 Repeated Substrings(后缀数组)

String analysis often arises in applications from biology and chemistry, such as the study of DNA...
  • sinat_34550050
  • sinat_34550050
  • 2017年03月21日 21:50
  • 114

杭电ACM1238——Substrings

题目的意思是,找到各个串的最长子串,输出长度。 我们找到最短的串,枚举这个串的所有子串,需要注意的是,这些子串的逆序也是可以的。 知道了这些,就可以写出代码了。 下面是AC的代码: #incl...
  • qq_25425023
  • qq_25425023
  • 2015年07月20日 22:43
  • 580

51Nod-1632-B君的连通

ACM模版描述题解拿到这道题,很容易发现,有效的数据只有n,其他的连通都是烟雾弹,毕竟保证是一颗树。想到这里,知道是需要找规律推公式,可是半天没能推出公式,然后参考了一些大神的思路……因为每炸毁一条边...
  • f_zyj
  • f_zyj
  • 2016年08月16日 17:15
  • 538

leetcode 647. Palindromic Substrings

Given a string, your task is to count how many palindromic substrings in this string. The substrings...
  • huanghanqian
  • huanghanqian
  • 2017年08月02日 14:49
  • 1089

Alibaba - UVa 1632 dp

Alibaba the famous character of our childhood stories would like to be immortal in order to keep bri...
  • u014733623
  • u014733623
  • 2014年07月27日 12:56
  • 557
内容举报
返回顶部
收藏助手
不良信息举报
您举报文章:CSU 1632 Repeated Substrings
举报原因:
原因补充:

(最多只允许输入30个字)