【LeetCode】311d：字符串的前缀分数和_leetcode字符串分数-CSDN博客

本文链接：https://blog.csdn.net/whq___/article/details/126976038

311d：字符串的前缀分数和

题目大意

给你一个长度为 n 的数组 words ，该数组由非空字符串组成。
定义字符串 word 的分数等于以 word 作为前缀的 words[i] 的数目。
例如，如果 words = ["a", "ab", "abc", "cab"] ，那么 "ab"的分数是 2 ，因为 "ab" 是 "ab" 和 "abc" 的一个前缀。
返回一个长度为 n 的数组 answer ，其中answer[i]是 words[i] 的每个非空前缀的分数总和。
注意：字符串视作它自身的一个前缀。

示例 1：

输入：words = ["abc","ab","bc","b"]
输出：[5,4,3,2]
解释：对应每个字符串的答案如下：
- "abc" 有 3 个前缀："a"、"ab" 和 "abc" 。
- 2 个字符串的前缀为 "a" ，2 个字符串的前缀为 "ab" ，1 个字符串的前缀为 "abc" 。
总计 answer[0] = 2 + 2 + 1 = 5 。
- "ab" 有 2 个前缀："a" 和 "ab" 。
- 2 个字符串的前缀为 "a" ，2 个字符串的前缀为 "ab" 。
总计 answer[1] = 2 + 2 = 4 。
- "bc" 有 2 个前缀："b" 和 "bc" 。
- 2 个字符串的前缀为 "b" ，1 个字符串的前缀为 "bc" 。 
总计 answer[2] = 2 + 1 = 3 。
- "b" 有 1 个前缀："b"。
- 2 个字符串的前缀为 "b" 。
总计 answer[3] = 2 。

示例 2：

输入：words = ["abcd"]
输出：[4]
解释：
"abcd" 有 4 个前缀 "a"、"ab"、"abc" 和 "abcd"。
每个前缀的分数都是 1 ，总计 answer[0] = 1 + 1 + 1 + 1 = 4 。

提示：

1 <= words.length <= 1000
1 <= words[i].length <= 1000
words[i] 由小写英文字母组成

思路：字典树(trier树)

第一想法是用哈希unordered_map，结果几发TLE。基本思路是遍历字符串数组，对每个字符串，在map中累加每个前缀的数量。最后，对每个字符串累加其所有前缀在map中的计数即得其答案。因为所有的前缀个数为1000*1000，则该方法表面上的时间复杂度为 $O (1 e 3 * 1 e 3)$ ，但实际上每个字符串操作的时间也要算上(取子串，包括存入map)，因此其时间复杂度为 $O (1 e 3 * 1 e 3 * 1 e 3) = O (1 e 9)$ ，会超时。

故而采用trier树方法。该方法适用于字符串插入与查找，并且通常字符串只有小写。由题意得，所有的字符总数不超过1e6，而1e6个int的空间为4M，取N为1e6，则tr[N][26]为100M左右，空间上是满足的。
首先，每个字符串可理解为trier树上的一条路径，每个字符为trier上的一条边，头结点tr[0]为空节点。与二叉树的每个节点只有left与right两个分支不同，trier树用在小写字符串时有26个分支。对一个字符串的插入，会依次比对每个字符在树中是否存在，若存在则往下走，若不存在则开一个新的分支。例如字符串“ad”，初始时从根节点开始走，p = 0，首先对字符’a’，若tr[0][‘a’-'a']未走过(初始值为0)，则为其分配一个节点的标号++ idx，若走过，则跳转到tr[0][0]，即p=tr[0][0]，继续对字符’d’，判断tr[p]['d'-'a']，依次类推。
当最后一个字符遍历完后，可用cnt数组来记录以该结点为结尾的字符串数量，最后一个节点为p，则cnt[p] ++;，在本题中因为每个前缀都要插入，因此遍历每个字符时都需要cnt[p] ++;。
idx可理解为对每个节点的编号。若tr[p][u]大于0，说明指向的下一结点存在，即当前子串是存在的。

代码

const int N = 1e6 + 10;

int tr[N][26], cnt[N], idx;

class Solution {
public:
    void insert(string& word){
        int p = 0;
        for(auto c : word){
            int u = c - 'a';
            if(!tr[p][u]) tr[p][u] = ++ idx;
            p = tr[p][u];
            cnt[p] ++;
        }
    }
    int query(string& word){
        int p = 0, res = 0;
        for(auto c : word){
            int u = c - 'a';
            if(!tr[p][u]) return 0;
            p = tr[p][u];
            res += cnt[p];
        }
        return res;
    }

    vector<int> sumPrefixScores(vector<string>& words) {
        idx = 0;
        memset(tr, 0, sizeof tr);
        memset(cnt, 0, sizeof cnt);

        for(auto& word : words) insert(word);
        vector<int> res;
        for(auto& word : words){
            res.push_back(query(word));
        }

        return res;
    }
};

代码(结构体版本)

class Solution {
public:
    vector<int> sumPrefixScores(vector<string> &words) {
        struct Node {
            Node *son[26]{};
            int score = 0;
        };
        Node *root = new Node();
        for (auto &word : words) {
            auto cur = root;
            for (char c : word) {
                c -= 'a';
                if (cur->son[c] == nullptr) cur->son[c] = new Node();
                cur = cur->son[c];
                ++cur->score; // 更新所有前缀的分数
            }
        }

        int n = words.size();
        vector<int> ans(n);
        for (int i = 0; i < n; ++i) {
            auto cur = root;
            for (char c : words[i]) {
                cur = cur->son[c - 'a'];
                ans[i] += cur->score; // 累加分数，即可得到答案
            }
        }
        return ans;
    }
};

时间复杂度

本题用trier树算法的时间复杂度为 $O (1 e 6)$ ，不会超时。哈希表超时的原因在于存在重复操作，例如对于字符串"abcd"，则前缀“abc”和“abcd”都将遍历一次，每个字符被重复计算，而trier树中，每个字符串中的每个字符都只被计算一次。

附录(哈希思路)

哈希TLE代码O(n^3)

class Solution {
public:
    vector<int> sumPrefixScores(vector<string>& words) {
        unordered_map<string, int> hash;
        for(auto &word : words){
            for(int i = 1; i <= word.size(); i ++ ){
                hash[word.substr(0, i)] ++ ;
            }
        }
        vector<int> res;
        for(auto &word : words){
            int s = 0;
            for(int i = 1; i <= word.size(); i ++ ){
                string str = word.substr(0, i);
                if(hash.count(str)){
                    s += hash[str];
                }
            }
            res.push_back(s);
        }
        
        return res;
    }
};

字符串哈希优化：将复杂度降为O(n^2)

代码引自夜阑听雨大佬

class Solution {
public:
    typedef unsigned long long ULL;
    static const int P = 131;
    ULL hash;
    vector<int> sumPrefixScores(vector<string>& words) {
        unordered_map<ULL, int> mp;
        for (auto &s : words) {
            hash = 0;
            for (int i = 0; i < s.size(); i ++ ){
                hash = hash * P + s[i];
                mp[hash] ++;
            }
        }
        vector<int> ans (words.size(), 0);
        for (int i = 0; i < words.size(); i ++) {
            string& s = words[i];
            hash = 0;
            for (int j = 0; j < s.size(); j ++ ){
                hash = hash * P + s[j];
                ans[i] += mp[hash];
            }
        }
        return ans;
    }
};