2018-08-09 Trie树&字符串hash

最新推荐文章于 2022-07-24 21:32:40 发布

zm_zsy

最新推荐文章于 2022-07-24 21:32:40 发布

阅读量154

点赞数

分类专栏：数据结构

本文链接：https://blog.csdn.net/zm_zsy/article/details/81540625

版权

数据结构专栏收录该内容

36 篇文章 0 订阅

订阅专栏

A -- Can you find it?
Description

Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle.
Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text
As an example, consider N=3, NC=4 and the text "daababac". The different substrings of size 3 that can be found in this text are: "daa"; "aab"; "aba"; "bab"; "bac". Therefore, the answer should be 5

Input

The first line of input consists of two numbers, N and NC, separated by exactly one space. This is followed by the text where the search takes place. You may assume that the maximum number of substrings formed by the possible set of characters does not exceed 16 Millions

Output

The program should output just an integer corresponding to the number of different substrings of size N found in the given text

Sample Input

13 5

1 2 1 2 3 1 2 3 1 3 2 1 2

1 2 3 1 3

13 5

1 2 1 2 3 1 2 3 1 3 2 1 2

1 2 3 2 1

Sample Output

-1

题目理解

标准 $kmp$ 模板题也可以用字符串 $hash$ 来做，就是将 $maxn$ 可能长度的字符串通过位运算映射到 $hash$ 表上然后判断是否相同，如果数值太大则需要选取比较好的模数才能AC

#include<cstdio>
const int maxn=1000005;
const int maxm=10005;
int a[maxn],b[maxm],next[maxm];
void getNext(int len)
{
    next[0] = -1;
    int i=0,k = -1;
    while(i < len)
    {
        while( k!=-1&&b[i]!=b[k])
        {
            k=next[k];
        }
        if(k==-1||b[i]==b[k])
        {
            ++i,++k;
            next[i] = k;
        }
    }
    return ;
}
int main()
{
    int t,n,m;
    scanf("%d",&t);
    while(t--){
        scanf("%d%d",&n,&m);
        for(int i=0;i<n;++i)
            scanf("%d",&a[i]);
        for(int i=0;i<m;++i)
            scanf("%d",&b[i]);
        getNext(m);
        //for(int i=0;i<m;++i)
          //  printf("%d\n",next[i]);

        int i=0,k=0;
        while(i<n)
        {
           while(k!=-1&&a[i]!=b[k])
           {
                k=next[k];
           }
           if(k==-1||a[i]==b[k]) // 这个条件可以省略
               ++i,++k;
           if(k==m)
                break;
         }
         if(k==m)
           printf("%d\n",i-m+1);
         else
           printf("-1\n");
     }
     return 0;
}

B -- Can you find it?
Description

Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle
Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text
As an example, consider N=3, NC=4 and the text "daababac". The different substrings of size 3 that can be found in this text are: "daa"; "aab"; "aba"; "bab"; "bac". Therefore, the answer should be 5

Input

Output

The program should output just an integer corresponding to the number of different substrings of size N found in the given text

Sample Input

3 4

daababac

Sample Output

题目理解

这道题 $hash$ 做法和上一道题类似，虽然给了子串种类的最大值，所以 $hash$ 表最大不会超过存储范围不需要取模。有一个技巧就是将每个出现的字母先映射到 int 数组上然后计算的时候直接通过数字进行计算(范围是 $[0,nc-1]$ )

//hash记录出现过的子串相当于map函数
#include<cstdio>
#include<cstring>
const int MAXN=16000005;
char str[MAXN];
int a[1005];
int Hash[MAXN];
int main()
{
    memset(a,0,sizeof(a));
    memset(Hash,0,sizeof(Hash));
    int n,nc,num,cnt,sum;
    scanf("%d%d",&n,&nc);
    scanf("%s",str);
    int len=strlen(str);num=0;
    for(int i=0;i<len;i++)
        if(!a[str[i]])//映射到NC进制树上
            a[str[i]]=++num;
    cnt=0;
    for(int i=0;i<len-n+1;i++)
    {
        sum=0;
        for(int j=i;j<i+n;j++)
            sum+=sum*nc+a[str[j]];//计算NC进制下的子串的对应值
        if(!Hash[sum])
        {
            cnt++;
            Hash[sum]=1;
        }
    }
    printf("%d\n",cnt);
    return 0;

}

C -- Can you find it?
Description

As is known to all, in many cases, a word has two meanings. Such as “hehe”, which not only means “hehe”, but also means “excuse me”
Today, ?? is chating with MeiZi online, MeiZi sends a sentence A to ??. ?? is so smart that he knows the word B in the sentence has two meanings. He wants to know how many kinds of meanings MeiZi can express

Input

The first line of the input gives the number of test cases T; T test cases follow
Each test case contains two strings A and B, A means the sentence MeiZi sends to ??, B means the word B which has two menaings. string only contains lowercase letters
Limits
T <= 30
|A| <= 100000
|B| <= |A|

Output

For each test case, output one line containing “Case #x: y” (without quotes) , where x is the test case number (starting from 1) and y is the number of the different meaning of this sentence may be. Since this number may be quite large, you should output the answer modulo 1000000007

Sample Input

hehehe

hehe

woquxizaolehehe

woquxizaole

hehehehe

hehe

owoadiuhzgneninougur

iehiehieh

Sample Output

Case #1: 3

Case #2: 2

Case #3: 5

Case #4: 1

题目理解

因为字符串会有两种意思，所以首先需要进行一次可以重叠的 $kmp$ 的查找，找到每个具有多义字符串的位置标记下来，然后进行。 dp(i) 表示以结尾的字符串最多有多少种语意，然后如果进行过多义字符串标记转移状态就有两种语意表达方式得到的转移方程为 dp(i)=dp(i-1)+dp(i-len) 后者表示多以字符串整体表示的语意；否则直接转移

#include<cstdio>
#include<cstring>
#define maxn 100100
#define mod 1000000007
using namespace std;
char str[maxn],s[maxn];
int _next[maxn],vis[maxn],dp[maxn];
void getNext()
{
    int len=strlen(s);
    _next[0]=-1;
    int i=0,k=-1;
    while(i<len)
    {
        while(k!=-1&&s[i]!=s[k])
        {
            k=_next[k];
        }
        if(k==-1||s[i]==s[k])
        {
            ++i,++k;
            _next[i]=k;
        }
    }
    return ;
}

int main()
{
    int t;
    scanf("%d",&t);
    for(int cas=1;cas<=t;++cas)
    {
        memset(vis,0,sizeof(vis));
        memset(dp,0,sizeof(dp));
        scanf("%s",str);
        //printf("%s\n",s);
        scanf("%s",s);
        getNext();
        int strLen=strlen(str);
        int sLen=strlen(s);
        int i=0,k=0;
        while(i<strLen)
        {
           while(k!=-1&&str[i]!=s[k])
           {
                k=_next[k];
           }
           if(k==-1||str[i]==s[k])
               ++i,++k;
           if(k==sLen){
                vis[i-1]=1;
                i--;
                k=_next[k-1];//相当k++;
            }
        }dp[0]=1;
        if(vis[0]) dp[0]=2;
        for(int i=1;i<strLen;++i){
            if(vis[i]){
                //printf("%d\n",i);
                if(i-sLen<0)dp[i]=(dp[i-1]+1)%mod;
                else dp[i]=(dp[i-1]+dp[i-sLen])%mod;
            }
            else dp[i]=dp[i-1];
        }
        printf("Case #%d: %d\n",cas,dp[strLen-1]);
    }
    return 0;

}

D -- Can you find it?
Description

Ignatius最近遇到一个难题,老师交给他很多单词(只有小写字母组成,不会有重复的单词出现),现在老师要他统计出以某个字符串为前缀的单词数量(单词本身也是自己的前缀)

Input

输入数据的第一部分是一张单词表,每行一个单词,单词的长度不超过10,它们代表的是老师交给Ignatius统计的单词,一个空行代表单词表的结束.第二部分是一连串的提问,每行一个提问,每个提问都是一个字符串
注意:本题只有一组测试数据,处理到文件结束

Output

对于每个提问,给出以该字符串为前缀的单词的数量

Sample Input

banana

band

bee

absolute

acm

band

abc

Sample Output

题目理解

多字符串查找前缀所以我们通过字典树来优化我们的效率，在这里对被查找字符串建树，因为每个字符都有可能成为前缀，所以每个字符结点都需要携带一个计数变量，当每次遍历到那个字符的时候就+1。进行前缀查询的时候只需要输出查询字符串尾结点所携带的计数信息即可，如果过程中遍历到空节点直接返回0

//递归采用指针移动+申请新内存不要出现变量值否则递归出错
#include<iostream>
#include<cstdio>
#include<cstring>
using namespace std;
struct node{
   int cnt;
   node* next[26];//指向新的内存空间
};
node root;
//刷新一个单词的存储结果
void build_tree(char* s){
   node* p=&root;
   int len=strlen(s);
   for(int i=0;i<len;++i){
     int index=s[i]-'a';
     if(p->next[index]==NULL){
        node* q=(node*)malloc(sizeof(root));
        for(int i=0;i<26;++i)
            q->next[i]=NULL;
        q->cnt=1;//这里是初始化不是自增
        p->next[index]=q;
        p=q;
     }else{
        p=p->next[index];
        (p->cnt)++;//再次遍历到这个节点
     }
   }
}
int query(char* s){
    int len=strlen(s);
    node* p=&root;
    for(int i=0;i<len;++i){
        int index=s[i]-'a';
        if(p->next[index]==NULL)
            return 0;
        p=p->next[index];//当前字符为根
    }
    return p->cnt;
}
char str[15];
int main()
{
    for(int i=0;i<26;i++)
	  root.next[i]=NULL;//初始化
    while(gets(str)&&str[0]!='\0'){//gets得到回车的值
        build_tree(str);
    }
    while(~scanf("%s",str)){
        printf("%d\n",query(str));
    }
    return 0;
}

F -- Can you find it?
Description

Neal is very curious about combinatorial problems, and now here comes a problem about words. Knowing that Ray has a photographic memory and this may not trouble him, Neal gives it to Jiejie. Since Jiejie can’t remember numbers clearly, he just uses sticks to help himself. Allowing for Jiejie’s only 20071027 sticks, he can only record the remainders of the numbers divided by total amount of sticks. The problem is as follows: a word needs to be divided into small pieces in such a way that each piece is from some given set of words. Given a word and the set of words, Jiejie should calculate the number of ways the given word can be divided, using the words in the set

Input

The input ﬁle contains multiple test cases. For each test case: the ﬁrst line contains the given word whose length is no more than 300 000. The second line contains an integer S, 1 ≤ S ≤ 4000. Each of the following S lines contains one word from the set. Each word will be at most 100 characters long. There will be no two identical words and all letters in the words will be lowercase. There is a blank line between consecutive test cases. You should proceed to the end of ﬁle

Output

For each test case, output the number, as described above, from the task description modulo 20071027

Sample Input

abcd

Sample Output

Case 1: 2

题目理解

其实就是对字符串的拼接，有点类似背包问题的化简因为没有权重。重要的是我们要如何建立状态转移方程，结合字典树我们知道字典树是对前缀的高效率遍历，所以不妨 dp(i) 表示为以开头的字符串的拼接方法数，然后在转移的过程中将后移前缀长度的值转移到当前字符的值中。因为没有权重只需要对串尾值赋值1，这个1在转移的过程中有多少种方式传递到 dp(i) 就相应的具有多少种方法

//递归采用指针移动+申请新内存不要出现变量值否则递归出错
#include<iostream>
#include<cstdio>
#include<cstring>
using namespace std;
struct node{
   bool ed;
   node* next[26];//指向新的内存空间
};
node root;
const int maxn=300005;
const int mod=20071027;
int dp[maxn],n,cas=0;
char str[maxn],s[105];
//刷新一个单词的存储结果
void build_tree(char* s){
   node* p=&root;
   int len=strlen(s);
   for(int i=0;i<len;++i){
     int index=s[i]-'a';
     if(p->next[index]==NULL){
        node* q=(node*)malloc(sizeof(root));
        q->ed=false;
        for(int i=0;i<26;++i)
            q->next[i]=NULL;
        p->next[index]=q;
        p=q;
     }else{
        p=p->next[index];
     }
   }
   p->ed=true;
}
void query(int cur,char* s){
    int len=strlen(s);
    node* p=&root;
    for(int i=0;i<len;++i){
        int index=s[i]-'a';
        if(p->next[index]==NULL)//没有匹配的前缀
            return;
        p=p->next[index];//当前字符为根
        if(p->ed){
            dp[cur]=(dp[cur]+dp[cur+i+1])%mod;//不会超前面循环控制
        }
    }
    return ;
}
void rfree(node* root){
    for(int i=0;i<26;++i){
        if(root->next[i]!=NULL)
            rfree(root->next[i]);
    }
    free(root);
    return ;
}
void reset(){
    for(int i=0;i<26;++i){
        if(root.next[i]!=NULL)
           rfree(root.next[i]);
        root.next[i]=NULL;//曾经记录过内存但是那个地方释放了记得要修改记录值否则WA
    }
    return ;
}

int main()
{
    for(int i=0;i<26;i++)
	  root.next[i]=NULL;//初始化
    while(~scanf("%s",str)){//gets得到回车的值
        scanf("%d",&n);
        for(int i=0;i<n;++i){
            scanf("%s",s);
            build_tree(s);
        }
        memset(dp,0,sizeof(dp));
        int len=strlen(str);
        dp[len]=1;
        for(int i=len-1;i>=0;--i){
            //printf("%d\n",dp[i]);
            query(i,str+i);
        }
        printf("Case %d: %d\n",++cas,dp[0]);
        reset();
    }
    return 0;
}

G -- Can you find it?
Description

给定一些数，求这些数中两个数的异或值最大的那个值

Input

多组数据。第一行为数字个数n，1 <= n <= 10 ^ 5。接下来n行每行一个32位有符号非负整数

Output

任意两数最大异或值

Sample Input

Sample Output

题目理解

这道题做之前就一直卡在从高位建树还是从低位建树，但是效果都不太好。后来看了别人的题解发现需要将每一位都整合为31位然后从高位建树，因为就算高位补零如果没有出现高位一的话对最后的结果也没有影响。在遍历的过程中对每一位进行对比如果存在相反数字那么说明当前位可以异或为一(从高位建树遍历过程中贪心筛选)，如果没有顺着当前数字向下遍历。所以在遍历31位二进制数的过程中只需要的到为一的数字然后进行当前位数所代表的的权值的叠加就可以得到最大值

//高位值0^0为0所以可以将输入的数标准化为31位01串在搜索的时候贪心搜索不同值
#include<algorithm>
#include<cstdio>
#include<cstring>
#include<cmath>
using namespace std;
const int maxn=2;
struct node{
   node* next[maxn];//指向新的内存空间
};
node root;
//刷新一个单词的存储结果
void build_tree(int x){
   node* p=&root;
   for(int i=30;i>=0;--i){
     int index=((1<<i)&x)?1:0;//这里得到的是一个32位数要转化为0、1
     if(p->next[index]==NULL){
        node* q=(node*)malloc(sizeof(root));
        for(int i=0;i<maxn;++i)
            q->next[i]=NULL;
        //q->k=i;//记录当这一位不相同时的指数值
        p->next[index]=q;
        p=q;
     }else{
        p=p->next[index];
     }
   }
}
int query(int x){
    node* p=&root;
    int sum=0;
    for(int i=30;i>=0;--i){
        int index=((1<<i)&x)?0:1;
        if(p->next[index]!=NULL){//所查找相异位存在增加
            sum+=(int)pow(2,i);
            p=p->next[index];
        }else{
            p=p->next[index^1];
        }
    }
    return sum;
}
void reset(node* root){
    for(int i=0;i<maxn;++i){
        if(root->next[i]!=NULL)
            reset(root->next[i]);
    }
    free(root);
}
int main()
{
    /*int a,b;
    a=0,b=1<<30;
    int ans=a^b;
    while(ans){
        printf("%d",ans&1);
        ans=ans>>1;
    }*/
    //int 有符号32位
    int n,x,ans;
    while(~scanf("%d",&n)){
        ans=-1;
        while(n--){
            scanf("%d",&x);
            build_tree(x);//一定要先插入一个数否则第一个数没数匹配递归出错
            ans=max(ans,query(x));
            //printf("%d\n",ans);
        }
        printf("%d\n",ans);
        for(int i=0;i<maxn;++i){
          if(root.next[i]!=NULL)
            reset(root.next[i]);
          root.next[i]=NULL;//曾经记录过内存但是那个地方释放了记得要修改记录值否则WA
        }
    }
    return 0;
}

H -- Can you find it?
Description

Zeus 和 Prometheus 做了一个游戏，Prometheus 给 Zeus 一个集合，集合中包含了N个正整数，随后 Prometheus 将向 Zeus 发起M次询问，每次询问中包含一个正整数 S ，之后 Zeus 需要在集合当中找出一个正整数 K ，使得 K 与 S 的异或结果最大。Prometheus 为了让 Zeus 看到人类的伟大，随即同意 Zeus 可以向人类求助。你能证明人类的智慧么

Input

输入包含若干组测试数据，每组测试数据包含若干行
输入的第一行是一个整数T（T < 10），表示共有T组数据
每组数据的第一行输入两个正整数N，M（<1=N,M<=100000），接下来一行，包含N个正整数，代表 Zeus 的获得的集合，之后M行，每行一个正整数S，代表 Prometheus 询问的正整数。所有正整数均不超过2^32

Output

对于每组数据，首先需要输出单独一行”Case #?:”，其中问号处应填入当前的数据组数，组数从1开始计算
对于每个询问，输出一个正整数K，使得K与S异或值最大

Sample Input

3 2

3 4 5

4 1

4 6 5 6

Sample Output

Case #1:

Case #2:

题目理解

这道题的思路和上一题一模一样了。只是他不需要输出最大值而是最大值树上对应的值，只需要每次查找的时候得到结果后与查找值相异或就得到树上的对应值了

//高位值0^0为0所以可以将输入的数标准化为31位01串在搜索的时候贪心搜索不同值
#include<algorithm>
#include<cstdio>
#include<cstring>
#include<cmath>
using namespace std;
const int maxn=2;
struct node{
   node* next[maxn];//指向新的内存空间
};
node root;
//刷新一个单词的存储结果
void build_tree(int x){
   node* p=&root;
   for(int i=30;i>=0;--i){
     int index=((1<<i)&x)?1:0;//这里得到的是一个32位数要转化为0、1
     if(p->next[index]==NULL){
        node* q=(node*)malloc(sizeof(root));
        for(int i=0;i<maxn;++i)
            q->next[i]=NULL;
        //q->k=i;//记录当这一位不相同时的指数值
        p->next[index]=q;
        p=q;
     }else{
        p=p->next[index];
     }
   }
}
int query(int x){
    node* p=&root;
    int sum=0;
    for(int i=30;i>=0;--i){
        int index=((1<<i)&x)?0:1;
        if(p->next[index]!=NULL){//所查找相异位存在增加
            sum+=(int)pow(2,i);
            p=p->next[index];
        }else{
            p=p->next[index^1];
        }
    }
    return sum;
}
void rfree(node* root){
    for(int i=0;i<maxn;++i){
        if(root->next[i]!=NULL)
            rfree(root->next[i]);
    }
    free(root);
    return ;
}
void reset(){
    for(int i=0;i<maxn;++i){
        if(root.next[i]!=NULL)
           rfree(root.next[i]);
        root.next[i]=NULL;//曾经记录过内存但是那个地方释放了记得要修改记录值否则WA
    }
    return ;
}

int main()
{
    /*int a,b;
    a=0,b=1<<30;
    int ans=a^b;
    while(ans){
        printf("%d",ans&1);
        ans=ans>>1;
    }*/
    //int 有符号32位
    int t,n,m,x,ans;
    scanf("%d",&t);
    for(int cas=1;cas<=t;++cas){
        scanf("%d%d",&n,&m);
        for(int i=0;i<n;++i){
            scanf("%d",&x);
            build_tree(x);
        }
        printf("Case #%d:\n",cas);
        for(int i=0;i<m;++i){
            scanf("%d",&x);
            ans=query(x);
            printf("%d\n",ans^x);
        }
        reset();
    }
    return 0;
}