30. Substring with Concatenation of All Words

劲蜡鸡腿堡

于 2020-01-11 13:27:52 发布

阅读量212

点赞数

分类专栏： Leetcode

本文链接：https://blog.csdn.net/qq_37654704/article/details/103935738

版权

Leetcode 专栏收录该内容

44 篇文章 0 订阅

订阅专栏

题目传送门

问题分析

按照问题描述，我们需要找出特殊子串位置，这个字串由words组成，你要让words中的字符串都出现在这个字串中并且仅出现一次。

思路1：

对words排列组合，我们找到所有的符合条件的子串substr。这个组合数目一共有words.size()!个。我们依次对这些可能进行查找。
这个代码最容易实现，但是我们也看到了他会有words.size()!种组合，这是个阶乘数，他的增长速率很高，并且字符串匹配算法最少都要O(n)。我们看10! = 3,628,800‬，这完全让人无法接受。

思路2:

我们就需要用传统的方法对出现的字串进行统计，具体执行流程如下。
1）找到出现words{}中任意一个字符的地方
2）检验下一个字符是不是也是words{}中的，并且判断是否之前出现过(保证仅使用1次)
3）如果我们将words{}全部找齐，那么我们就找到了一个合法字串。我们接着从下一个位置开始找
4）如果我们碰到words{}使用多次或者匹配到非words{}的情况，则从下一个位置重新开始。
这个流程很容易想到，其思路跟简单字符串匹配算法一致。
优化这个流程
这个问题给了一个额外条件，就是words{}中每个字符串的大小一样，这个很关键，是我们优化的关键。
优化的基本思想就是：
如果我们将每个words{}的位置都标记出来，s = "wordgoodgoodgoodbestword"，words = ["word","good","best","word"]的下标就是0, 4, 8, 12, 16, 20。从中我们可以看出来合法的substr的下标一定是一个等差数列，差值为words[0].size()，但是这个等差数量是否就是我们找的字串，我们还需要检验他的唯一使用。所以这个问题被我们转换成了找下标的等差数列。
在找等差数列时，我们可以使用如下方法进行加速寻找：
假设此时序列为0 2 4 5 7 8 9 12。我们需要找出差值为4的等差序列。
我们可以对他们按照不同的基数进行划分，划分标准为index = val % 4
这样原序列就被划分成了
0 4 8 12
5 9
2
7
这样就很容易找到合法等差序列了，其中我们需要对这些索引记录下对应字符串，所以便有了vector<vector<pair<int, int>>> indexmap
因为这个算法不支持原始数据中出现两个相同字符的情况，所以我们还需手动去重，并且记录下他出现的次数，这便是vector<pair<string, int>> words_ceil;的作用。

代码

class Solution {
public:
    vector<int> findSubstring(string s, vector<string>& words) {
        if(s.size() == 0 || words.size() == 0)
            return {};
        unordered_map<string, int> set_map;
        vector<pair<string, int>> words_ceil;
        vector<vector<pair<int, int>>> indexmap(words[0].size());
        vector<int> result;
        /*words 去重*/
        for(int i = 0; i < words.size(); i++)
            set_map[words[i]]++;
        for(auto it : set_map)
            words_ceil.push_back({it.first, it.second});
        
        for(int i = 0; i < words_ceil.size(); i++){
            size_t pos = 0;
            do{
                pos = s.find(words_ceil[i].first, pos);
                if(pos != string::npos){
                    indexmap[pos % words_ceil[i].first.size()].push_back({pos, i});
                    pos ++;
                }
            }while(pos != string::npos);
        }

        for(int i = 0; i < words[0].size(); i++)
            sort(indexmap[i].begin(), indexmap[i].end(), [](pair<int, int> &a, pair<int, int> &b)->bool{return a.first < b.first;});

        for(int i = 0; i < words[0].size(); i++){
            if(indexmap[i].size() == 0)
                continue;
            vector<int> wordscount(words_ceil.size(), 0);
            int head = 0, count = 0;
            for(int j = 0; j < indexmap[i].size(); j++){
                if(j != 0 && indexmap[i][j].first > indexmap[i][j - 1].first + words_ceil[0].first.size()){
                    wordscount = vector<int>(words_ceil.size(), 0);
                    head = j;
                    count = 0;
                }
                while(wordscount[indexmap[i][j].second] >= words_ceil[indexmap[i][j].second].second){
                    wordscount[indexmap[i][head++].second]--; 
                    count--;
                }
                wordscount[indexmap[i][j].second]++;
                count++;
                if(count == words.size()){
                    result.push_back(indexmap[i][head].first);
                    wordscount[indexmap[i][head++].second]--;
                    count--;                     
                }
            }
        }
        return result;
    }
};