【算法】字符串

字符串匹配

KMP算法:线性复杂度的字符串匹配算法

关于KMP算法,先从最初的字符串匹配说起;

#leetcode
28. Implement strStr()   QuestionEditorial Solution  My Submissions
Returns the index of the first occurrence of needle in haystack, or -1 if needle is not part of haystack.
Subscribe to see which companies asked this question
这里的haystack--source,needle--target

最初的算法是两层循环嵌套,将source和target以step=1遍历;代码如下:

class Solution {
public:
    int strStr(string haystack, string needle) {
        if (haystack.empty() && needle.empty()) return 0;
        if (haystack.empty()) return -1;
        if (needle.empty()) return 0;
        // in case of overflow for negative
        if (haystack.size() < needle.size()) return -1;

        for (int i = 0; i < haystack.size() - needle.size() + 1; i++) {
            string::size_type j = 0;
            for (; j < needle.size(); j++) {
                if (haystack[i + j] != needle[j]) break;
            }
            if (j == needle.size()) return i;
        }
        return -1;
    }
};

如果当前字符不匹配则目前的source中的字符索引加1,target索引归零;这样一旦目前字符不匹配,则target从0开始进行匹配,最坏时间复杂度是O(N*M);
KMP所解决的就是target归零的问题,一旦target不需要归零,就可以按照step>1的效率步进,提高效率;当然KMP所做的不仅是增大了step,通过计算next数组,可以不需要嵌套循环,复杂度变为O(M+N)

KMP算法步骤:

1)生成next数组
对于字符串str,声明next数组,大小为strlen(str)(str类型为const char*时),初始化值为-1;
从i=1,str[i]开始,如果str[1]==str[0]–>next[1]=1;
i++;
此时i==2,若str[2]==str[next[1]],则next[2]=next[1]+1;
对于更一般的情况:
若str[i]=str[next[i-1]+1],那么next[i]=next[i-1]+1;
否则,我们要做的是找到与next[i]相等,且i之前某些位字符能够与str前缀(索引从0开始)完全吻合,于是我们进行一次循环查找,根据next[i-1]我们可以知道str[i-1]之前的s=next[i-1]位字符能够与str前缀(索引从0开始)的s位完全吻合,如果str[s+1]与str[i]相等,那么我们知道了应有next[i]=s+1;如果str[s+1]!=str[i]那么我们继续寻找下一个与目前i所在匹配字符串相等的字符索引,即s=next[s];可以看出这是一个循环,因此用while实现。
2)根据next数组进行匹配
这一部分相对要容易理解,通过next数组我们可以让source(大字符串)的索引不归零,而只需needle(小字符串)的索引步进step,而且这种步进step可能大于一,这中特性类似有限状态自动机,在失配后,之前匹配的字符仍然是有作用的,不需重复匹配。
对于更加具体的算法解析,参考博客:
v_JULY_v:“从头到尾彻底理解KMP”

c++ code:
class Solution {
public:
    int strStr(string haystack, string needle) {
        if(haystack.size()<needle.size())return -1;
        if(needle.size()==0)return 0;
        int size=needle.size();
        //int* next=new int[size];
        //next={0};
        int next[size]={-1};
        for(int i=1;i<needle.size();i++)
        {
            int prei=next[i-1];
            while(prei>=0&&needle[i]!=needle[prei+1]){prei=next[prei];}
            next[i]=needle[i]==needle[prei+1]?prei+1:-1;
        }
        int k=0;
        for(int j=0;j<haystack.size();)
        {
            if(needle[k]==haystack[j]){
                if(k==size-1)return j-size+1;
                k++;j++;continue;
            }
           //if(haystack.size()-j+1<size)return -1;
            else
            {
                if(k!=0)
                k=next[k-1]+1;
                else j++;
            }
        }
        return -1;
    }
};

判断两个字符串是否互为变分词angram

方法一:hashmap统计词频;
方法二:排序比较是否相等;

或者是输出字符串数组中的所有存在变分词的字符串:
方法一:两次遍历
方法二:排序+hashmap,在c++中,stl下有unordered_map

最长公共子序列Longest Common Substring

Rotate String

Question
lintcode: (8) Rotate String
Problem Statement
Given a string and an offset, rotate string by offset. (rotate from left to right)
Example
Given "abcdefg".
offset=0 => "abcdefg"
offset=1 => "gabcdef"
offset=2 => "fgabcde"
offset=3 => "efgabcd"

方法:先将offset点的前半部分和后半部分分别反转再进行整体反转

class Solution {
public:
  /**
     * param A: A string
     * param offset: Rotate string with offset.
     * return: Rotated string.
     */
    string rotateString(string A, int offset) {
        if (A.empty() || A.size() == 0) {
            return A;
        }

        int len = A.size();
        offset %= len;
        reverse(A, 0, len - offset - 1);
        reverse(A, len - offset, len - 1);
        reverse(A, 0, len - 1);
        return A;
    }

private:
    void reverse(string &str, int start, int end) {
        while (start < end) {
            char temp = str[start];
            str[start] = str[end];
            str[end] = temp;
            start++;
            end--;
        }
    }
};
#Python - immutable string

class Solution:
    """
    param A: A string
    param offset: Rotate string with offset.
    return: Rotated string.
    """
    def rotateString(self, A, offset):
        if A is None or len(A) == 0:
            return A

        offset %= len(A)
        before = A[:len(A) - offset]
        after = A[len(A) - offset:]
        # [::-1] means reverse in Python
        A = before[::-1] + after[::-1]
        A = A[::-1]

        return A
#Python - mutable list

class Solution:
    # @param A: a list of char
    # @param offset: an integer
    # @return: nothing
    def rotateString(self, A, offset):
        if A is None or len(A) == 0:
            return

        offset %= len(A)
        self.reverse(A, 0, len(A)-offset-1)
        self.reverse(A, len(A)-offset, len(A)-1)
        self.reverse(A, 0, len(A)-1)

    def reverse(self, str_l, start, end):
        while start < end:
            str_l[start], str_l[end] = str_l[end], str_l[start]
            start += 1
            end -= 1

判断是否回文palindrome

常用的双指针法,一个指头一个指尾;注意在python中char.isalnum()函数判断是否是一个数字或字母;在c+_+中是isalnum(char)函数.

Longest Palindromic Substring最长回文子字符串

方法一:穷举时间复杂度 O(n3)
方法二:假定扫描的每个字母是回文的中间位置(需要处理奇偶两种情况),从该位置向两头搜索寻找最大回文长度;我自己写的code如下,简单地将奇偶两种情况用了两个for循环进行;
class Solution {
public:
    string longestPalindrome(string s) {
        if(s.empty())return s;
        if(s.size()==1)return s;
        int maxLen=0;
        string maxStr;
        int l,r;
        for(int i=1;i<s.size();i++)
        {
            l=i-1,r=i+1;
            while(l>=0&&r<s.size())
            {
                if(!isalnum(s[l]))
                {
                    l--;
                    continue;
                }
                if(!isalnum(s[r]))
                {
                    r++;
                    continue;
                }
                if(tolower(s[r])==tolower(s[l]))
                {
                    r++;l--;
                }
                else break;
            }
            if((r-l)-1>maxLen)
        {
            maxLen=(r-l)-1;
            maxStr=s.substr(l+1,r-l-1);
        }
        }
        for(int i=0;i<s.size();i++)
        {
            l=i,r=i+1;
            while(l>=0&&r<s.size())
            {
                if(!isalnum(s[l]))
                {
                    l--;
                    continue;
                }
                if(!isalnum(s[r]))
                {
                    r++;
                    continue;
                }
                if(tolower(s[r])==tolower(s[l]))
                {
                    r++;l--;
                }
                else break;
            }
            if((r-l)-1>maxLen)
        {
            maxLen=(r-l)-1;
            maxStr=s.substr(l+1,r-l-1);
        }
        }
        return maxStr;
    }
};
方法三:时间复杂度,空间复杂度: O(n2)

具体算法见!Manacher’s algorithm
目前没有必要具体了解;

Space Replacement

Question
lintcode: (212) Space Replacement
Write a method to replace all spaces in a string with %20. 
The string is given in a characters array, you can assume it has enough space 
for replacement and you are given the true length of the string.
Example
Given "Mr John Smith", length = 13.

The string after replacement should be "Mr%20John%20Smith".

Note
If you are using Java or Python,please use characters array instead of string.

Challenge
Do it in-place.

注意问题:因为空格变为%20之后,字符串的长度变大了,所以要先统计空格的总数,计算所需的总的字符串空间,这时可以有足够的空间来重新写入字符串,而不用考虑字符串的移位的问题;然后从后向前遍历字符串,因为字符串后方有足够的空间进行重新插入、改写,事实上我们可以精准的知道每个字符应该被写入的位置,这样写入到字符串即可。

Wildcard Matching通配符匹配–leetcode待完成

Question

leetcode: Wildcard Matching | LeetCode OJ
lintcode: (192) Wildcard Matching
Implement wildcard pattern matching with support for '?' and '*'.

'?' Matches any single character.
'*' Matches any sequence of characters (including the empty sequence).
The matching should cover the entire input string (not partial).

Example
isMatch("aa","a") → false
isMatch("aa","aa") → true
isMatch("aaa","aa") → false
isMatch("aa", "*") → true
isMatch("aa", "a*") → true
isMatch("ab", "?*") → true
isMatch("aab", "c*a*b") → false

我自己的理解,主要是处理,进一步的说,主要是处理含有通配符的字符串(假设只有一个含有通配符,两个都有的情况更复杂),该字符串的两个之间,开头与第一个之间,最后一个与结尾之间的字符串必须能在另一个字符串中完全匹配。即如下中的第一个字符串中的abc,def,ghi必须能完全匹配,如果在某一个字符出不能匹配,则需要回溯到*之后的紧邻的字符,并在另一字符串处的下一个匹配位置重新匹配,直至耗尽大字符串。

abc*def*ghi
abcddeffghimh

Length of last word

方法:先从后往前把末尾的空格都删除;继续从后往前到遇到的第一个空格处,计数;

class Solution {
public:
    int lengthOfLastWord(string s) {
        //delete space in the end
        if(s.size()==0)return 0;
        int end=s.size()-1;
        while(end>=0&&s[end]==32)
        {
            s=s.substr(0,end);
            end--;
        }
        int i=0;
        //int end1=end;
        while(end>=0&&s[end]!=32)
        {
            i++;
            end--;
        }
        return i;
    }
};

Count and say

The count-and-say sequence is the sequence of integers beginning as follows:

1, 11, 21, 1211, 111221, ...

1 is read off as "one 1" or 11.

11 is read off as "two 1s" or 21.

21 is read off as "one 2, then one 1" or 1211.

Given an integer n, generate the nth sequence.

Example
Given n = 5, return "111221".

Note
The sequence of integers will be represented as a string.

运用了迭代的方法进行,注意c++中的要将char、int等类型转化为string用到了c++的流处理(原来是这么好用的东西),其中的iostringstream声明的对象可以直接接受<<输入,不论输入的是什么类型(char或int)都会被原封不动地变为string;

class Solution {
public:
///用了迭代方法,不是递归
    string countAndSay(int n) {
         string s="1";
        if(n==1)return s;
        for(int i=2;i<=n;i++)
        {
          s=helper(s);  
        }
        return s;
    }
    string helper(string s)
    {
    //流处理的方法很好用
        ostringstream result;
      // string result;
       char temp=s[0];
       int tempCount=1;
       for(int i=1;i<s.size();i++)
       {

           char curr=s[i];
           if(curr!=temp)
           {
           //不论是char还是int都可以直接变为string
               result<<tempCount;
               result<<temp;
               temp=curr;
               tempCount=1;
               //result.push_back(string(tempCount));
               //result.push_back(string(temp));
               //result+=string(tempCount)+string(temp);
           }
           else
           {
               tempCount++;
           }
       }
        result<<tempCount;
        result<<temp;
       return result.str();
    }
};

另外有递归法:

class Solution {
public:
    string countAndSay(int n) {
        if (n == 1) return "1";             // base case
        string res, tmp = countAndSay(n - 1);  // recursion
        char c = tmp[0];
        int count = 1;
        for (int i = 1; i < tmp.size(); i++)
            if (tmp[i] == c)
                count++;
            else {
                res += to_string(count);
                res.push_back(c);
                c = tmp[i];
                count = 1;
            }
        res += to_string(count);
        res.push_back(c);
        return res;
    }
};
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值