Oulipo HDU 1686 （哈希或KMP）

最新推荐文章于 2019-08-17 09:05:07 发布

Top_Spirit

最新推荐文章于 2019-08-17 09:05:07 发布

阅读量141

点赞数

分类专栏： KMP Hash 文章标签： acm 哈希 KMP

本文链接：https://blog.csdn.net/weixin_41190227/article/details/86503452

版权

KMP 同时被 2 个专栏收录

21 篇文章 0 订阅

订阅专栏

Hash

14 篇文章 0 订阅

订阅专栏

该博客介绍了如何利用哈希和KMP算法解决字符串中特定子串出现次数的问题。作者首先提及法国作家Georges Perec的无'e'作品，并引出ACM竞赛中的类似挑战，即限制特定字母的使用次数来创作文本。接着，博客详细解释了哈希算法如何计算字符串中子串的出现次数，并解决了哈希值计算过程中遇到的问题。此外，还提到了KMP算法作为另一种解决方案。最后，博客提供了样例输入和输出，以及对这两种算法的个人理解和感受。

摘要由CSDN通过智能技术生成

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book:

Tout avait Pair normal, mais tout s’affirmait faux. Tout avait Fair normal, d’abord, puis surgissait l’inhumain, l’affolant. Il aurait voulu savoir où s’articulait l’association qui l’unissait au roman : stir son tapis, assaillant à tout instant son imagination, l’intuition d’un tabou, la vision d’un mal obscur, d’un quoi vacant, d’un non-dit : la vision, l’avision d’un oubli commandant tout, où s’abolissait la raison : tout avait l’air normal mais…

Perec would probably have scored high (or rather, low) in the following contest. People are asked to write a perhaps even meaningful text on some subject with as few occurrences of a given “word” as possible. Our task is to provide the jury with a program that counts these occurrences, in order to obtain a ranking of the competitors. These competitors often write very long texts with nonsense meaning; a sequence of 500,000 consecutive 'T's is not unusual. And they never use spaces.

So we want to quickly find out how often a word, i.e., a given string, occurs in a text. More formally: given the alphabet {'A', 'B', 'C', …, 'Z'} and two finite strings over that alphabet, a word W and a text T, count the number of occurrences of W in T. All the consecutive characters of W must exactly match consecutive characters of T. Occurrences may overlap.

Input

The first line of the input file contains a single number: the number of test cases to follow. Each test case has the following format:

One line with the word W, a string over {'A', 'B', 'C', …, 'Z'}, with 1 ≤ |W| ≤ 10,000 (here |W| denotes the length of the string W).
One line with the text T, a string over {'A', 'B', 'C', …, 'Z'}, with |W| ≤ |T| ≤ 1,000,000.

Output

For every test case in the input file, the output should contain a single number, on a single line: the number of occurrences of the word W in the text T.

Sample Input
3
BAPC
BAPC
AZA
AZAZAZA
VERDI
AVERDXIVYERDIAN
Sample Output
1
3
0

题目大意：输入有n组，每组输入两个字符串s1， s2，求在s2这个字符串中有多少个s1这样的子串。

这个是曾经写过的一道题，之前是用KMP写的，也是我刚学KMP的时候写的第一道题，就是个模板题，现在看看水到不行。。。

最近一直在学哈希，所以用哈希写了一发，也是挺好想的，写起来不费劲。就是学hash的时候一直有的一个小问题在这解决了。

先说一个知识点。

比如， s1 ： 12303 ， s2 ： 212303

我们用哈希的时候会先算s1长度的两个串的hash值，然后进行比较（比较的是12303和21230这两个的hash值），接下来就是比较第一个串的12303 和第二个串的12303（因为匹配的时候这一步从s2的第二个字符开始匹配），那这个问题就是s2串的12303的hash值怎么求的问题。

我们有一个标准seed = 233，它的hash值就是_Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;因为hash也是转化成p进制的，在这里p只是一个数量级。所以说每次求后移一位之后的串的hash值就是_Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;

简单的说就是求以现在为截点，去掉第一个字符后，尾部加一个字符之后的hash值。

KMP算法：

/*
@Author: Top_Spirit
@Language: C++
*/
#include <bits/stdc++.h>
using namespace std ;
typedef unsigned long long ull ;
typedef long long ll ;
const int Maxn = 1e4 + 10 ;
const int INF = 0x3f3f3f3f ;
const double PI = acos(-1.0) ;
const int seed = 133 ;

int n ;
string s1, s2 ;
int len1, len2 ;
int Next[Maxn] ;

void Get_Next() {
    Next[0] = -1 ;
    int k = -1, j = 0 ;
    while (j < len1){
        if (k == -1 || s1[j] == s1[k]) Next[++j] = ++k ;
        else k = Next[k] ;
    }
}

int KMP (){
    int i = 0, j = 0 ;
    int ans = 0 ;
    while (i < len2){
        if (j == -1 || s1[j] == s2[i]) {
            i++ ;
            j++ ;
        }
        else j = Next[j] ;
        if (j == len1) ans++ ;
    }
    return ans ;
}

int main (){
    ios_base::sync_with_stdio(false) ;
    cin.tie(0) ;
    cout.tie(0) ;
    int n ;
    cin >> n ;
    while (n--){
        cin >> s1 >> s2 ;
        len1 = s1.size() ;
        len2 = s2.size() ;
        Get_Next() ;
        int ans = 0 ;
         ans = KMP() ;
        cout << ans << endl ;
    }
    return 0 ;
}

哈希算法：

/*
@Author: Top_Spirit
@Language: C++
*/
#include <bits/stdc++.h>
using namespace std ;
typedef unsigned long long ull ;
typedef long long ll ;
const int Maxn = 1e4 + 10 ;
const int INF = 0x3f3f3f3f ;
const double PI = acos(-1.0) ;
const int seed = 133 ;

int n ;
string s1, s2 ;
int len1, len2 ;

int main (){
    ios_base::sync_with_stdio(false) ;
    cin.tie(0) ;
    cout.tie(0) ;
    cin >> n ;
    while (n--){
        cin >> s1 >> s2 ;
        len1 = s1.size() ;
        len2 = s2.size() ;
        if (len1 > len2 ) {
            cout << 0 << endl ;
            continue ;
        }
        int p = 1 ;
        for (int i = 1; i <= len1; i++) p *= seed ;
        int _Hash1 = 0, _Hash2 = 0 ;
        for (int i = 0; i < len1; i++){
            _Hash1 = _Hash1 * seed + s1[i] ;
            _Hash2 = _Hash2 * seed + s2[i] ;
        }
//        cout << "++++" << endl ;
        int ans = 0 ;
        for (int i = 0; i + len1 <= len2; i++){
            if (_Hash1 == _Hash2) ans++;
            _Hash2 = _Hash2 * seed + s2[i + len1] - p * s2[i] ;
        }
        cout << ans << endl ;
    }
    return 0 ;
}

落花人独立

微雨燕双飞

Top_Spirit

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Oulipo HDU 1686 （哈希或KMP）

The French author Georges Perec (1936–1982) once wrote a book, La disparition, without the letter 'e'. He was a member of the Oulipo group. A quote from the book: Tout avait Pair normal, mais tout s...
复制链接

扫一扫

专栏目录