Oulipo HDU 1686 (哈希或KMP)

14 篇文章 0 订阅
该博客介绍了如何利用哈希和KMP算法解决字符串中特定子串出现次数的问题。作者首先提及法国作家Georges Perec的无'e'作品,并引出ACM竞赛中的类似挑战,即限制特定字母的使用次数来创作文本。接着,博客详细解释了哈希算法如何计算字符串中子串的出现次数,并解决了哈希值计算过程中遇到的问题。此外,还提到了KMP算法作为另一种解决方案。最后,博客提供了样例输入和输出,以及对这两种算法的个人理解和感受。
摘要由CSDN通过智能技术生成

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book: 

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais… 

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces. 

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap. 
 

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format: 

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W). 
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000. 

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T. 
 

Sample Input

3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN

Sample Output

1
3
0

题目大意: 输入有n组,每组输入两个字符串s1, s2, 求在s2这个字符串中有多少个s1这样的子串。

这个是曾经写过的一道题,之前是用KMP写的,也是我刚学KMP的时候写的第一道题,就是个模板题,现在看看水到不行。。。

最近一直在学哈希,所以用哈希写了一发,也是挺好想的,写起来不费劲。就是学hash的时候一直有的一个小问题在这解决了。

先说一个知识点。  

比如,    s1 : 12303 ,     s2 : 212303

我们用哈希的时候会先算s1长度的两个串的hash值, 然后进行比较(比较的是12303和21230这两个的hash值), 接下来就是比较第一个串的12303 和第二个串的12303(因为匹配的时候这一步从s2的第二个字符开始匹配), 那这个问题就是s2串的12303的hash值怎么求的问题。

我们有一个标准seed = 233, 它的hash值就是_Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;因为hash也是转化成p进制的,在这里p只是一个数量级。所以说每次求后移一位之后的串的hash值就是_Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;

简单的说就是求以现在为截点,去掉第一个字符后,尾部加一个字符之后的hash值。

KMP算法:

/*
@Author: Top_Spirit
@Language: C++
*/
#include <bits/stdc++.h>
using namespace std ;
typedef unsigned long long ull ;
typedef long long ll ;
const int Maxn = 1e4 + 10 ;
const int INF = 0x3f3f3f3f ;
const double PI = acos(-1.0) ;
const int seed = 133 ;

int n ;
string s1, s2 ;
int len1, len2 ;
int Next[Maxn] ;

void Get_Next() {
    Next[0] = -1 ;
    int k = -1, j = 0 ;
    while (j < len1){
        if (k == -1 || s1[j] == s1[k]) Next[++j] = ++k ;
        else k = Next[k] ;
    }
}

int KMP (){
    int i = 0, j = 0 ;
    int ans = 0 ;
    while (i < len2){
        if (j == -1 || s1[j] == s2[i]) {
            i++ ;
            j++ ;
        }
        else j = Next[j] ;
        if (j == len1) ans++ ;
    }
    return ans ;
}

int main (){
    ios_base::sync_with_stdio(false) ;
    cin.tie(0) ;
    cout.tie(0) ;
    int n ;
    cin >> n ;
    while (n--){
        cin >> s1 >> s2 ;
        len1 = s1.size() ;
        len2 = s2.size() ;
        Get_Next() ;
        int ans = 0 ;
         ans = KMP() ;
        cout << ans << endl ;
    }
    return 0 ;
}

哈希算法:

/*
@Author: Top_Spirit
@Language: C++
*/
#include <bits/stdc++.h>
using namespace std ;
typedef unsigned long long ull ;
typedef long long ll ;
const int Maxn = 1e4 + 10 ;
const int INF = 0x3f3f3f3f ;
const double PI = acos(-1.0) ;
const int seed = 133 ;

int n ;
string s1, s2 ;
int len1, len2 ;

int main (){
    ios_base::sync_with_stdio(false) ;
    cin.tie(0) ;
    cout.tie(0) ;
    cin >> n ;
    while (n--){
        cin >> s1 >> s2 ;
        len1 = s1.size() ;
        len2 = s2.size() ;
        if (len1 > len2 ) {
            cout << 0 << endl ;
            continue ;
        }
        int p = 1 ;
        for (int i = 1; i <= len1; i++) p *= seed ;
        int _Hash1 = 0, _Hash2 = 0 ;
        for (int i = 0; i < len1; i++){
            _Hash1 = _Hash1 * seed + s1[i] ;
            _Hash2 = _Hash2 * seed + s2[i] ;
        }
//        cout << "++++" << endl ;
        int ans = 0 ;
        for (int i = 0; i + len1 <= len2; i++){
            if (_Hash1 == _Hash2) ans++;
            _Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;
        }
        cout << ans << endl ;
    }
    return 0 ;
}

落花人独立

微雨燕双飞

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值