2019 icpc 上海 B Prefix Code（哈希或者字典树）

最新推荐文章于 2021-08-04 17:18:59 发布

Aloof__

最新推荐文章于 2021-08-04 17:18:59 发布

阅读量982

点赞数 1

本文链接：https://blog.csdn.net/weixin_43872728/article/details/104573205

版权

补题之前我先说一下我当时在现场的狗血经历；

当时我们三个真的是菜，菜到什么地步呢，B题是一个签到题，结果我们四个小时才A出来，说实话真的是浪费去上海这个名额了；当时B题不知道怎么写的，反正最后胡乱写的就过了；那时候根本不知道字典树，字符串哈希什么东东；（说出来是真丢脸，害。。。）光题意读懂就花了四个小时。。。。。。

链接：https://ac.nowcoder.com/acm/contest/4370/B
来源：牛客网

时间限制：C/C++ 5秒，其他语言10秒
空间限制：C/C++ 262144K，其他语言524288K
64bit IO Format: %lld

题目描述

A prefix code is a type of code system distinguished by its possession of the "prefix property'', which requires that there is no whole code word in the system that is a prefix (initial segment) of any other code word in the system. It is trivially true for fixed-length code, so only a point of consideration in variable-length code.
For example, a code with code words{9,55}{\{9, 55\}}{9,55}has the prefix property; a code consisting of {9,5,59,55}{\{9, 5, 59, 55\}}{9,5,59,55} does not, because "5'' is a prefix of "59'' and also of "55''. A prefix code is a uniquely decodable code: given a complete and accurate sequence, a receiver can identify each word without requiring a special marker between words. However, there are uniquely decodable codes that are not prefix codes; for instance, the reverse of a prefix code is still uniquely decodable (it is a suffix code), but it is not necessarily a prefix code.
Prefix codes are also known as prefix-free codes , prefix condition codes and instantaneous codes. Although Huffman coding is just one of many algorithms for deriving prefix codes, prefix codes are also widely referred to as "Huffman codes'', even when the code was not produced by a Huffman algorithm. The term comma-free code is sometimes also applied as a synonym for prefix-free codes but in most mathematical books and articles a comma-free code is used to mean a self-synchronizing code, a subclass of prefix codes.
Using prefix codes, a message can be transmitted as a sequence of concatenated code words, without any out-of-band markers or (alternatively) special markers between words to frame the words in the message. The recipient can decode the message unambiguously, by repeatedly finding and removing sequences that form valid code words. This is not generally possible with codes that lack the prefix property, for example {0,1,10,11}{\{0, 1, 10, 11\}}{0,1,10,11}: a receiver reading a "1'' at the start of a code word would not know whether that was the complete code word "1'', or merely the prefix of the code word "10'' or "11''; so the string "10'' could be interpreted either as a single codeword or as the concatenation of the words "1" then "0".
The variable-length Huffman codes, country calling codes, the country and publisher parts of ISBNs, the Secondary Synchronization Codes used in the UMTS W-CDMA 3G Wireless Standard, and the instruction sets (machine language) of most computer microarchitectures are prefix codes.
Prefix codes are not error-correcting codes. In practice, a message might first be compressed with a prefix code, and then encoded again with channel coding (including error correction) before transmission.
For any uniquely decodable code there is a prefix code that has the same code word lengths. Kraft's inequality characterizes the sets of code word lengths that are possible in a uniquely decodable code.

In this problem, you can give a code with N\mathbf{N}N code words. Each word contains only numbers {0,1,2,3,4,5,6,7,8,9}{\{0, 1, 2, 3, 4, 5, 6, 7, 8, 9\}}{0,1,2,3,4,5,6,7,8,9}. You need to check whether the code can meet the prefix property: there is no whole code word in the system that is a prefix of any other code word.

输入描述:

The first line of the input gives the numbers of test cases, T\mathbf{T}T (1≤T≤1001 \leq \mathbf{T} \leq 1001≤T≤100). T\mathbf{T}T test cases follow.

Each test case consists of one line with one integer N\mathbf{N}N (1≤N≤10,0001 \leq \mathbf{N} \leq 10,0001≤N≤10,000), the number of code words.
Then follows N\mathbf{N}N lines with one code word on each line. A code word is a

sequence of at most ten digits.

输出描述:

For each test case, output one line containing "Case #x: y", where x is the test case number (starting from 1) and y is "Yes" if the code is a prefix code, or "No" if not.

示例1

输入

输出

Case #1: Yes
Case #2: No
Case #3: No

题意是给你n个字符串，让你找出来有没有一个字符串是其他任意一个字符串的前缀；如果有输出No，反之Yes；

用字典树的话，在insert的时候把走过的点标记一下，然后一个字符串的末端走过两遍，或者走的时候发现走到一个字符串的末端了，那么就存在前缀；
用字符串哈希的话，算出每个字符串每个位置的哈希值，然后放进map里面，只要这个值出现两次，就说明有前缀出现；

字典树：

#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ll;
const int N=10005*15;
int son[N][26], cnt[N],vis[N],idx,flag;
// 0号点既是根节点，又是空节点
// son[][]存储树中每个节点的子节点
// cnt[]存储以每个节点结尾的单词数量
 
// 插入一个字符串
void insert(char *str)
{
    int p = 0;
    for (int i = 0; str[i]; i ++ )
    {
        int u = str[i] - '0';
        if (!son[p][u]) son[p][u] = ++ idx;
        p = son[p][u];
        if(cnt[p]) flag=1;
        vis[p]++;
    }
    if(vis[p]>1) flag=1;
    cnt[p] ++ ;
}

int main()
{
    int t,n;
    cin >>t;
    int id=1;
    while(t--)
    {
        memset(vis,0,sizeof vis);
        memset(son,0,sizeof son);
        memset(cnt,0,sizeof cnt);
        flag=0,idx=0;
        char s[20];
        cin >>n;
        while(n--)
        {
            cin >>s;
            insert(s);
        }
        if(flag) printf("Case #%d: No\n",id++);
        else printf("Case #%d: Yes\n",id++);
    }
}

字符串哈希：

这里需要注意unordered_map和map的区别，前者增删改查复杂度O(1)，后者为O(longn)；

#include <bits/stdc++.h>
using namespace std;
typedef unsigned long long ll;
string s[10050];
const ll pp=13331;
ll h[10050][20],p[10050][20];
unordered_map<ll,int> mp;
ll get(int i,int r)
{
    return h[i][r];
}
int main()
{
    int t;
    cin >>t;
    int id=1;
    while(t--)
    {
    	mp.clear();
        int n;
        cin >>n;
        for(int i=0;i<=n;i++)
            for(int j=0;j<15;j++) h[i][j]=0,p[i][j]=0;
        for(int i=0;i<n;i++) cin >>s[i];
        for(int i=0;i<n;i++)
        {
            h[i][0]=s[i][0];
            mp[h[i][0]]++;
            for(int j=1;j<s[i].size();j++)
            {
                h[i][j]=h[i][j-1]*pp+s[i][j];
                mp[h[i][j]]++;
            }
        }
        bool flag=false;
        for(int i=0;i<n;i++)
        {
            int y=s[i].size()-1;
            ll x=h[i][y];
            if(mp[x]>=2)
            {
            	flag=true;
            	break;
			}
        }
        if(flag) printf("Case #%d: No\n",id++);
        else printf("Case #%d: Yes\n",id++);
    }
}

三种做法的速度比较：