LintCode 字符串查找（暴力法+KMP）

最新推荐文章于 2019-05-05 23:30:00 发布

thinkerleo7798

最新推荐文章于 2019-05-05 23:30:00 发布

阅读量2.6k

点赞数

分类专栏：算法和数据结构 LintCode刷题集 LintCode刷题集文章标签： kmp 算法字符串

本文链接：https://blog.csdn.net/thinkerleo1997/article/details/78170242

版权

算法和数据结构同时被 3 个专栏收录

29 篇文章 0 订阅

订阅专栏

LintCode刷题集

17 篇文章 0 订阅

订阅专栏

LintCode刷题集

17 篇文章 1 订阅

订阅专栏

本文介绍了如何在LintCode平台上解决字符串查找问题，分别讲解了暴力法和KMP算法的实现原理。暴力法通过逐个字符比对寻找目标字符串出现的位置，而KMP算法利用预计算的next数组优化匹配过程，降低时间复杂度至O(m+n)。以字符串'ABAABCABA'为例，详细解释了KMP算法中next数组的计算和匹配失败时的移动策略。

摘要由CSDN通过智能技术生成

URL:http://www.lintcode.com/zh-cn/problem/strstr/
对于一个给定的 source 字符串和一个 target 字符串，你应该在 source 字符串中找出 target 字符串出现的第一个位置(从0开始)。如果不存在，则返回 -1。

暴力法：
从文本串的第一个元素开始比对，如果和模式串相符，那么就是返回匹配位置，否则就从文本串下一个开始比对，AC代码：

class Solution {
public:
    /**
     * Returns a index to the first occurrence of target in source,
     * or -1  if target is not part of source.
     * @param source string to be scanned.
     * @param target string containing the sequence of characters to match.
     */
    int strStr(const char *source, const char *target) {
        // write your code here
        if (!(source && target))
            return -1;
        int len_source = strlen(source);
        int len_target = strlen(target);
        if (len_target == 0)
            return 0;
        if (len_source == 0)
            return -1;
        for (int i = 0; i < len_source; i++) {
            for (int j = 0, s = i; j < len_target && s < len_source; j++, s++) {
                if (source[s] != target[j])
                    break;
                else {
                    if (j == len_target - 1)
                        return i;
                }
            }
        }
        return -1;
    }
};

int main() {
    char source[] = "abcsde";
    char target[] = "a";
    Solution so;
    cout << so.strStr(source, target);
}

KMP算法：
KMP算法是D.E.Knuth，J.H.Morris和V.R.Pratt三位大佬同时研发出来的算法，可以使字符串匹配的时间复杂度下降到 $O(m+n)$ 的线性复杂度。

在KMP算法中，对于每一个模式串我们会事先计算出模式串的内部匹配信息，在匹配失败时最大的移动模式串，以减少匹配次数。

以字符串 ABAABCABA 为例：
我们需要求出保存每个元素位置的最长相等前缀后缀长度的next数组，
什么是前缀后缀？以第六个元素C为例：

前缀串	后缀串
A	B
AB	AB
ABA	AAB
ABAA	BAAB
ABAAB	ABAAB

这就是第六个元素所有的前缀和后缀串，最后一个不计在内，所以最长相等前缀后缀是AB，长度为2，所以next数组中C的位置就是2。
这个模式串的next数组如下：

s:	A	B	A	A	B	C	A	B	A
next:	-1	0	0	1	1	2	0	1	2

求出next数组后，假设在文本串第五个C的位置匹配失败时，那就将它从最长相等前缀后缀的后缀处滑动到前缀处开始匹配。
AC代码：

class Solution {
public:
    /**
     * Returns a index to the first occurrence of target in source,
     * or -1  if target is not part of source.
     * @param source string to be scanned.
     * @param target string containing the sequence of characters to match.
     */
    void getNext(const char *p, int next[]) {
        int nLen = (int) strlen(p);
        next[0] = -1;
        int k = -1;
        int j = 0;
        while (j < nLen - 1) {
            if (k == -1 || p[j] == p[k]) {
                ++j;
                ++k;
                next[j] = k;
            } else {
                k = next[k];
            }
        }
    }

    int kmp(const char *source, const char *target) {
        int target_nexts[1000];
        getNext(target, target_nexts);
        int ans = -1;
        int i = 0;
        int j = 0;
        int pattern_len = strlen(target);
        int n = strlen(source);
        while (i < n) {
            if (j == -1 || source[i] == target[j]) {
                ++j;
                ++i;
            } else
                j = target_nexts[j];
            if (j == pattern_len) {
                ans = i - pattern_len;
                break;
            }
        }
        return ans;
    }

    int strStr(const char *source, const char *target) {
        // write your code here
        if (!(source && target))
            return -1;
        int len_source = strlen(source);
        int len_target = strlen(target);
        if (len_target == 0)
            return 0;
        if (len_source == 0)
            return -1;
        return kmp(source, target);
    }
};