LC-792. 匹配子序列的单词数（二分查找、字典树+DFS）

Miraclo_acc

已于 2022-11-23 23:44:45 修改

阅读量348

点赞数

分类专栏：算法刷题记录文章标签：算法数据结构 leetcode

于 2022-11-23 23:12:55 首次发布

本文链接：https://blog.csdn.net/qq_42958831/article/details/128009764

版权

算法刷题记录专栏收录该内容

296 篇文章 1 订阅

订阅专栏

792. 匹配子序列的单词数

难度中等242

给定字符串 s 和字符串数组 words, 返回 words[i] 中是s的子序列的单词个数 。

字符串的 子序列 是从原始字符串中生成的新字符串，可以从中删去一些字符(可以是none)，而不改变其余字符的相对顺序。

例如， “ace” 是 “abcde” 的子序列。

示例 1:

输入: s = "abcde", words = ["a","bb","acd","ace"]
输出: 3
解释: 有三个是 s 的子序列的单词: "a", "acd", "ace"。

Example 2:

输入: s = "dsahjpjauf", words = ["ahjpjau","ja","ahbwzgqnuk","tnmlanowax"]
输出: 2

提示:

1 <= s.length <= 5 * 104
1 <= words.length <= 5000
1 <= words[i].length <= 50
words[i]和 s 都只由小写字母组成。

暴力（超时）

class Solution {
    public int numMatchingSubseq(String s, String[] words) {
        int res = 0;
        for(String word : words){
            int i = 0,j = 0;
            while(i < s.length()  && j < word.length()){
                if(s.charAt(i) == word.charAt(j)){
                    i++;j++;
                }else{
                    i++;
                }
            }
            if(j == word.length()) res++;
        }
        return res;
    }
}

解法一：二分查找(坐标哈希)

按照s中子母的顺序去查找words里面的单词是否存在，那么我们可以从words每一个单词开始对s里面的单词进行逐一比对。例如: acd,那么从s里面查找a后索引位置为i，那么从i+1开始继续查找c,直到查找完acd所有字符，那么代表该字符串为子序列。直接循环查找的时间复杂为O(n),那么对于某个字符下一次出现的位置可以使用二分查找进行寻找。

利用哈希表将所有字符出现的索引位置保存在一个集合里面，通过对该集合进行二分查找，例如对于acd中c的查找,当a在s中的索引值为i，那么接下来c的索引值必须大于等于i+1。

时间复杂度： $O (m l o g n)$ ，其中m为words中所有单词的长度之和，对于每一个字符都要进行一次二分查找
空间复杂度： $O (n)$ ，保存s中字母出现的索引值。

class Solution {
    int[][] cnt = new int[26][50005];
    public int numMatchingSubseq(String s, String[] words) {
        int ans = 0;
        for (int i = 0; i < s.length(); i++) {
            char c = s.charAt(i);
            cnt[c- 'a'][++cnt[c- 'a'][0]] = i; // cnt[i][0] 保存当前cnt[i]里面元素个数
        }
        for (int i = 0; i < words.length; i++) {
            int cur = -1, j;
            for (j = 0; j < words[i].length(); j++) {
                char c = words[i].charAt(j); 
                int t = search(c - 'a', cur);
                if (t == -1) break; cur = t + 1;
            }
            if (j == words[i].length()) ans++; 
        }
        return ans;
    }
    int search(int x, int cur) {
        int l = 1, r = cnt[x][0];
        while (l < r) {
            int mid = (l + r) / 2;
            if (cnt[x][mid] >= cur) r = mid;
            else l = mid + 1;
        } 
        return cnt[x][r] >= cur ? cnt[x][r] : -1;
    }
}

解法二：多指针优化

对于解法一，每个words中的字符串都会对s进行一次比较，那么其实可以反向思维一下，我们在s字符串上进行移动，我们给每一个word里面的字符串都加上一个指针，指针最开始的字符。若s当前的字符为c,那么所有words里面指针指向c这个字符的指针都应该往后移动，若移动到末尾，那么代表找到一个子序列。

按照这个思路可以很简单写出如下代码。（超时）

class Solution { 
    public int numMatchingSubseq(String s, String[] words) {
        int ans = 0;
        int[] p = new int[words.length]; //每个字符串的指针
        for (char c : s.toCharArray()) { // O(n)
            for (int i = 0; i < p.length; i++) if (p[i] < words[i].length() && words[i].charAt(p[i]) == c) p[i]++; //O(m)
        }
        for (int i = 0; i < p.length; i++) if (p[i] == words[i].length()) ans++;
        return ans;
    }
}

但是，我们发现每次去查找指向c字符的指针都遍历了整个words数组的长度，这个操作非常耗时，其实当一些指针指向末尾后，我们就不用再进行判断，因此可以从这进行优化

使用一个队列来保存当前指向某个字符的所有指针，这时候每次移动时只会遍历当前字符c对应的指针，节省了许多遍历次数。
时间复杂度： $O (n + m)$ , m为words中所有单词长度之和
空间复杂度： $O (l e n (w o r d s))$ ,指针的个数

class Solution { 
    public int numMatchingSubseq(String s, String[] words) {
        int ans = 0;
        Queue<int[]>[] q = new Queue[26];//模拟26个字符指针
        for (int i = 0; i < 26; i++) q[i] = new LinkedList<>();
        for (int i = 0; i < words.length; i++) q[words[i].charAt(0) - 'a'].add(new int[]{i, 0});
        for (char c : s.toCharArray()) { // O(n) 
            //目前指针位于c子母的字符串进行移动
            int size = q[c - 'a'].size();
            while (size-- > 0) {
                int[] tem = q[c - 'a'].poll();
                int i = tem[0], len = tem[1];
                if (len + 1 == words[i].length()) {
                    ans++; continue;
                }
                char t = words[i].charAt(len + 1);
                q[t - 'a'].add(new int[]{i, len + 1}); //指针移动
            }
        } 
        return ans;
    }
}

解法三：字典树

字典树构建完成后，深度优选遍历字典树，判断结点所代表的字符，是否出现在后续字符串中。

以测试用例:“abcde”,[“a”,“bb”,“acd”,“ace”]为例：

首次进入方法时，结点为root，字符串的起始位置为0

root结点的e为0，不加入结果

root的后续结点有a、b两个结点

a在字符串中存在，且下标为0。

递归调用search方法，结点为a，字符串的起始位置为1(0+1)

a结点的e为1，将1加入结果

a的后续结点有c结点

此时判断c在字符串中是否存在时，需要从a的后面开始查找，

这就是传入的起始位置的作用

class Solution {
        public int numMatchingSubseq(String s, String[] words) {
            Trie trie = new Trie();
            for (String word : words) {
                trie.insert(word);
            }
            return trie.search(s);
        }

        class Node {
            int e;
            Node[] nexts = new Node[26];
        }

        class Trie {
            Node root;

            public Trie() {
                root = new Node();
            }

            public void insert(String word) {
                Node cur = root;
                int index;
                for (char c : word.toCharArray()) {
                    index = c - 'a';
                    if (cur.nexts[index] == null) {
                        cur.nexts[index] = new Node();
                    }
                    cur = cur.nexts[index];
                }
                cur.e++;

            }

            int result;

            public int search(String word) {
                search(word, 0, root);
                return result;
            }

            /**
             * 字典树+深度优先遍历
             * e变量存储树中以此结点为结尾的单词的数量
             * <p>
             * 首先构建字典树，然后深度优先遍历字典树，
             * 当前结点如果e大于0，将e的数量加入result中
             * 遍历当前结点的后续结点，
             * 不为空时，判断后续结点的字符是否存在与字符串中
             * 如果存在则递归
             * <p>
             * 这里递归时传入的判断字符是否存在时的起始点
             * 也就是下一个字符必须出现在当前字符的后面才符合条件
             * <p>
             * 执行耗时:94 ms,击败了38.52% 的Java用户
             * 内存消耗:53.3 MB,击败了5.79% 的Java用户
             *
             * @param word
             * @param index
             * @param node
             */
            public void search(String word, int index, Node node) {
                if (node.e > 0) {
                    result += node.e;
                }
                Node next;
                int indexOf;
                for (int i = 0; i < node.nexts.length; i++) {
                    next = node.nexts[i];
                    if (next != null) {
                        indexOf = word.indexOf(i + 'a', index);
                        if (indexOf != -1) {
                            search(word, indexOf + 1, next);
                        }
                    }
                }
            }
        }
    }

Miraclo_acc

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
LC-792. 匹配子序列的单词数（二分查找、字典树+DFS）

难度中等242给定字符串s和字符串数组words, 返回 words[i]中是s的子序列的单词个数。字符串的是从原始字符串中生成的新字符串，可以从中删去一些字符(可以是none)，而不改变其余字符的相对顺序。
复制链接

扫一扫

专栏目录