leetcode 459. Repeated Substring Pattern | KMP

Description

Given a non-empty string check if it can be constructed by taking a substring of it and appending multiple copies of the substring together. You may assume the given string consists of lowercase English letters only and its length will not exceed 10000.

Example 1:
Input: "abab"

Output: True

Explanation: It's the substring "ab" twice.
Example 2:
Input: "aba"

Output: False
Example 3:
Input: "abcabcabcabc"

Output: True

Explanation: It's the substring "abc" four times. (And the substring "abcabc" twice.)

Solution

discuss有多种巧妙方式, 代码越简洁越倾向于利用内置函数, 本质上都是进行字符串比较.

29 ms CPP simple solution. No KMP.

class Solution {
public:
    bool repeatedSubstringPattern(string str) {
        string nextStr = str;
        int len = str.length();
        if(len < 1) return false;
        for(int i = 1; i <= len / 2; i++){
            if(len % i == 0){
                nextStr = leftShift(str, i);
                if(nextStr == str) return true;
            }
        }
        return false;
    }
    
    string leftShift(string &str, int l){
        string ret = str.substr(l);
        ret += str.substr(0, l);
        return ret;
    }
};

// 仿写的效率略差... = =, 没有判断i整除问题
class Solution {
public:
    bool repeatedSubstringPattern(string s) {
        int n = s.size();
        string head, rear,news;
        for (int i = 1; i < n; ++i) {
            head = s.substr(0, i);
            rear = s.substr(i, s.size());
            news = rear+head;
            if(s==news) return true;
        }
        return false;
    }
};

Easy python solution with explaination

Basic idea:

First char of input string is first char of repeated substring
Last char of input string is last char of repeated substring
Let S1 = S + S (where S in input string)
Remove 1 and last char of S1. Let this be S2
If S exists in S2 then return true else false
Let i be index in S2 where S starts then repeated substring length i + 1 and repeated substring S[0: i+1]

def repeatedSubstringPattern(self, str):

        """
        :type str: str
        :rtype: bool
        """
        if not str:
            return False
            
        ss = (str + str)[1:-1]
        return ss.find(str) != -1

C++ O(n) using KMP, 32ms, 8 lines of code with brief explanation.

First, we build the KMP table.

Roughly speaking, dp[i+1] stores the maximum number of characters that the string is repeating itself up to position i.
Therefore, if a string repeats a length 5 substring 4 times, then the last entry would be of value 15.
To check if the string is repeating itself, we just need the last entry to be non-zero and str.size() to divide (str.size()-last entry).

    bool repeatedSubstringPattern(string str) {
        int i = 1, j = 0, n = str.size();
        vector<int> dp(n+1,0);
        while( i < str.size() ){
            if( str[i] == str[j] ) dp[++i]=++j;
            else if( j == 0 ) i++;
            else j = dp[j];
        }
        return dp[n]&&dp[n]%(n-dp[n])==0;
    }

关于KMP的π的建立(或next的建立)过程理解
s = "ababaaa" 为例, 易得π = [0012311] ,这个π就是最大的前后缀重复位数, kmp的核心思想是记录自身和自身的重复过程, 采用next数组指导下一步应该跳转到pattern的哪个位置. 通常而言, next = [-1,π](或-1换为0), 此时next的第i+1个元素解读作: 从0到i的pattern的substr, 前缀和后缀重复元素个数为next[i]. 深刻理解这个观点, 则很容易弄清楚kmp的工作过程.
仍以上述s为例:

π

观察s,
a -> 0 by define
ab -> 00 "ab"这个子串(即从0到i的pattern的substr)没有前后缀重复
aba -> 001 "aba"中首尾的"a"是前后缀重复的
abab -> 0012 在上一步的基础上后移, "ab"作为重复的前后缀出现,元素个数为2
ababa -> 00123 aba**和xxaba 也是前后缀重复的形式, 重复元素的个数(即前缀或后缀长度)为3
ababaa -> 001231 在aba的基础上, 下一步是ababxx, 后缀是xxabaa, 不相等! 所以此时前后缀寻找方式从abax变为ax(和ab匹配),因为aa也不符合ab的前缀形式(后缀是...aa, 而最前面不是aa这种形式), a'x'再次退化为寻找x和开头的a匹配. 首字母和尾字母的a是相同的, 所以这里前后缀都为"a", 所以这里填1;
ababaaa -> 仍然是首尾的最大重复样式为"a"

next

在π的前面加个0就是next数组.next是有指导意义的, 具体生成过程以及在字符匹配的过程中的指导意义如下图(重在理解next[next[j]]迭代思路的原因):
next形成过程

next code

  • 如上述方式
int i = 1, j = 0;
vector<int> next(n+1,0);
while( i < str.size() ){
    if( str[i] == str[j] ) next[++i]=++j;    // 在i+1存储i之前有多大的公共前后缀
    else if( j == 0 ) i++;    // pattern无子串, 只需i++
    else j = next[j];  // 固定总str, 考察pattern, 向前迭代的过程
}
next[0] = -1;
  • 另一种方式 | 推荐
int j = -1, i = 0;
next[0] = -1;
while (i < str.size()) {
    if (j == -1 || str[i] == str[j]) next[++i] = ++j;
    else j = next[j];
}

debug捋一捋两种实现方式发现, 第一种跳过了许多赋值场景(因为仅当si==sj时赋值), 跳过位置默认为0, 故需要初始化为全0; 第二种方式不需要做全0初始化, 赋值次数也是每趟while都要赋值.

KMP

有了next数组之后, 就知道在pattern里面, j应该怎么跳转, 因为都是把pattern匹配到别的str, next求解是pattern匹配pattern, 所以下述代码逻辑上可以和求解next一致!

int KMP(string &s1,string &s2,vector<int>&next){
    int i=0,j=0,len1=s1.size(),len2=s2.size();
    while(i<len1 && j<len2){
        if(j==-1||s1[i]==s2[j]) {j++;i++;}
        else j=next[j];
    }
    if(j==len2) return i-len2;
    else return -1;
}

Reference

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值