The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:
Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…
Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.
So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.
Input
The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:
One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.Output
For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.
Sample Input
3 BAPC BAPC AZA AZAZAZA VERDI AVERDXIVYERDIAN
Sample Output
1 3 0
题目大意: 输入有n组,每组输入两个字符串s1, s2, 求在s2这个字符串中有多少个s1这样的子串。
这个是曾经写过的一道题,之前是用KMP写的,也是我刚学KMP的时候写的第一道题,就是个模板题,现在看看水到不行。。。
最近一直在学哈希,所以用哈希写了一发,也是挺好想的,写起来不费劲。就是学hash的时候一直有的一个小问题在这解决了。
先说一个知识点。
比如, s1 : 12303 , s2 : 212303
我们用哈希的时候会先算s1长度的两个串的hash值, 然后进行比较(比较的是12303和21230这两个的hash值), 接下来就是比较第一个串的12303 和第二个串的12303(因为匹配的时候这一步从s2的第二个字符开始匹配), 那这个问题就是s2串的12303的hash值怎么求的问题。
我们有一个标准seed = 233, 它的hash值就是_Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;因为hash也是转化成p进制的,在这里p只是一个数量级。所以说每次求后移一位之后的串的hash值就是_Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;
简单的说就是求以现在为截点,去掉第一个字符后,尾部加一个字符之后的hash值。
KMP算法:
/*
@Author: Top_Spirit
@Language: C++
*/
#include <bits/stdc++.h>
using namespace std ;
typedef unsigned long long ull ;
typedef long long ll ;
const int Maxn = 1e4 + 10 ;
const int INF = 0x3f3f3f3f ;
const double PI = acos(-1.0) ;
const int seed = 133 ;
int n ;
string s1, s2 ;
int len1, len2 ;
int Next[Maxn] ;
void Get_Next() {
Next[0] = -1 ;
int k = -1, j = 0 ;
while (j < len1){
if (k == -1 || s1[j] == s1[k]) Next[++j] = ++k ;
else k = Next[k] ;
}
}
int KMP (){
int i = 0, j = 0 ;
int ans = 0 ;
while (i < len2){
if (j == -1 || s1[j] == s2[i]) {
i++ ;
j++ ;
}
else j = Next[j] ;
if (j == len1) ans++ ;
}
return ans ;
}
int main (){
ios_base::sync_with_stdio(false) ;
cin.tie(0) ;
cout.tie(0) ;
int n ;
cin >> n ;
while (n--){
cin >> s1 >> s2 ;
len1 = s1.size() ;
len2 = s2.size() ;
Get_Next() ;
int ans = 0 ;
ans = KMP() ;
cout << ans << endl ;
}
return 0 ;
}
哈希算法:
/*
@Author: Top_Spirit
@Language: C++
*/
#include <bits/stdc++.h>
using namespace std ;
typedef unsigned long long ull ;
typedef long long ll ;
const int Maxn = 1e4 + 10 ;
const int INF = 0x3f3f3f3f ;
const double PI = acos(-1.0) ;
const int seed = 133 ;
int n ;
string s1, s2 ;
int len1, len2 ;
int main (){
ios_base::sync_with_stdio(false) ;
cin.tie(0) ;
cout.tie(0) ;
cin >> n ;
while (n--){
cin >> s1 >> s2 ;
len1 = s1.size() ;
len2 = s2.size() ;
if (len1 > len2 ) {
cout << 0 << endl ;
continue ;
}
int p = 1 ;
for (int i = 1; i <= len1; i++) p *= seed ;
int _Hash1 = 0, _Hash2 = 0 ;
for (int i = 0; i < len1; i++){
_Hash1 = _Hash1 * seed + s1[i] ;
_Hash2 = _Hash2 * seed + s2[i] ;
}
// cout << "++++" << endl ;
int ans = 0 ;
for (int i = 0; i + len1 <= len2; i++){
if (_Hash1 == _Hash2) ans++;
_Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;
}
cout << ans << endl ;
}
return 0 ;
}
落花人独立
微雨燕双飞