LeetCode 10. Regular Expression Matching

最新推荐文章于 2020-08-05 02:00:09 发布

wenyq7

最新推荐文章于 2020-08-05 02:00:09 发布

阅读量87

点赞数

分类专栏： LeetCode

本文链接：https://blog.csdn.net/qq_37333947/article/details/103139919

版权

LeetCode 专栏收录该内容

287 篇文章 1 订阅

订阅专栏

题目：

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*'.

'.' Matches any single character.
'*' Matches zero or more of the preceding element.

The matching should cover the entire input string (not partial).

Note:

s could be empty and contains only lowercase letters a-z.
p could be empty and contains only lowercase letters a-z, and characters like . or *.

Example 1:

Input:
s = "aa"
p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Example 2:

Input:
s = "aa"
p = "a*"
Output: true
Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".

Example 3:

Input:
s = "ab"
p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".

Example 4:

Input:
s = "aab"
p = "c*a*b"
Output: true
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore, it matches "aab".

Example 5:

Input:
s = "mississippi"
p = "mis*is*p*."
Output: false

这道题是某公司的某万年面经题。这道题有两个解法，递归或者dp。先开始学习了dp的解法：

令dp[i][j]表示s[0, ..., i - 1]和p[0, ..., j - 1]是否match，注意到是dp[i][j]是匹配到i -1和j - 1，因此我们的dp矩阵的行列数目是比原始数组多1的。那么我们可以得到dp矩阵如下：

p[0] p[1] p[2] ...
s[0] 1 0 0
s[1] 0
s[2] 0

注意我们的dp[0][0]表示的是s和p均为空的情况，这种情况下一定是match的，所以dp[0][0]为true。当p为空时，只有s为空才能match，因此第一列应当全部为false，而当s为空时，p可能可以以".*"的形式存在，因此需要后面在遍历矩阵的时候判断。

接下来我们按行遍历矩阵，两个for循环，外层的负责循环s[i]行，内层的负责循环p[j]列，因为第0列已经不需要再进行操作，因此j可以从1开始遍历。接下来就是分情况讨论的部分了：

1. p[j - 1] != '*'：

如果p[j - 1]（也就是最后一个字符）不为*的话，这种情况比较简单，直接就是dp[i -1][j - 1] && (p[j - 1] == '.' || s[i - 1] == p[j - 1])，也就是最后一个字符要匹配（或者p为.）且前面的字符都匹配；

2. p[j - 1] == '*'：

如果p[j - 1]为*的话，就比较复杂了。对于*的情况，它可以匹配前一个字符出现了一次或多次，或者前一个字符没有出现。这里又是两种情况：

2.1：匹配一次或多次

假如s="abc", p="abc*"，那么我们需要保证，s[i - 2]结尾的（ab）substring要和p[i - 1]结尾的（ab）substring匹配，即dp[i - 1][j] == true，并且s的最后一个字符要和p在*前面的字符匹配，也就是s[i - 1] == p[j - 2]或p[j - 2] == '.'

2.2：匹配空

假如s="abc", p="abcd*"，那么我们直接考虑以*前面的前面的字符结尾的字符串是否与s相匹配，也就是dp[i][j - 2]是否为true

分情况讨论完以后代码就很好写出来了，主要需要注意的地方就是，出现i - 1的时候要小心i为0的情况。

代码如下，时间复杂度我想应该是O(mn)，其中m是s的长度，n是p的长度，空间复杂度也是。运行时间8ms，64.76%，空间8.8M，69.49%：

class Solution {
public:
    /* 
    dp[i][j]: s[0, ..., i - 1] match p[0, ..., j - 1]
    if (p[j - 1] != '*'):
        dp[i][j] = dp[i-1][j-1] && (s[i - 1] == p[j - 1] || p[j - 1] == '.')
    if (p[j - 1] == '*'):
        if * indicates 0 times:
            dp[i][j] = dp[i][j-2]
        if * indicates 1 or more times:
            dp[i][j] = dp[i-1][j] && (s[i-1] == p[j-2] || p[j-2] == '.')
        p[0] p[1] p[2] ...
    s[0]  1    0    0
    s[1]  0
    s[2]  0
    ...
    */
    bool isMatch(string s, string p) {
        // s.size() + 1 rows
        // p.size() + 1 cols
        vector<vector<bool>> dp(s.size() + 1, vector<bool>(p.size() + 1, false));
        dp[0][0] = true;
        
        for (int i = 0; i < dp.size(); i++) {
            // if p is empty, only empty s can match it
            // traverse j starting from 1 (traverse p for a specific s)
            for (int j = 1; j < dp[0].size(); j++) {
                if (p[j - 1] != '*') {
                    dp[i][j] = i && dp[i - 1][j - 1] && (s[i - 1] == p[j - 1] || p[j - 1] == '.');
                }
                else {
                    dp[i][j] = dp[i][j - 2] || (i && dp[i-1][j] && (s[i-1] == p[j-2] || p[j-2] == '.'));
                }
            }
        }
        return dp[s.size()][p.size()];
    }
};

另一种方法是递归的方法，其实递归和dp还挺像的，分情况讨论都是一样的情况，只是dp考虑以某个字符为结尾的字符串，递归考虑以某个字符为开始的字符串。

对于递归，我们对于p的长度进行讨论，分以下几种情况（有的情况可以合并，但是我感觉合并以后不够直观，就不合并了）：

1. p.size() == 0：那么直接返回s.size() == 0，只有空字符串能匹配，这个可以作为base case

2. p.size() == 1：那么只有当s.size() == 1，且p[0]和s[0]匹配的情况下才能匹配（p[0] == s[0] || p[0] == '.'）

3. p.size() > 1：需要考虑第二个字符是否是*

3.1 p[1] != '*'，那么要求p[0]和s[0]要match，并且p[1]开始的substring要和s[1]开始的substring match

3.2 p[1] == '*'，那么还是分为两种情况，要么*表示匹配空，要么表示匹配一次或多次：

3.2.1 匹配一次或多次：s="abc", p="a*bc"，那么要求s[0] == p[0]（匹配一次嘛）并且s[1]开始的substring要和p继续match（p保持不变是因为可能可以match到0次的）

3.2.2 匹配空：s="abc"，p="z*abc"，那么要求s要match p[2]开始的substring

递归的时空复杂度比较复杂，暂且先略过……代码跑出来的时间208ms，16.52%，空间15.2M，11.87%，可以说是完全被dp比下去了。

class Solution {
public:
    bool isMatch(string s, string p) {
        if (p.empty()) {
            return s.empty();
        }
        bool first_match =  p[0] == s[0] or p[0] == '.';
        if (p.size() == 1) {
            return s.size() == 1 && first_match;
        }
        else {
            if (p[1] != '*') {
                return !s.empty() && first_match && isMatch(s.substr(1, s.size()), p.substr(1, p.size()));
            }
            else {
                return isMatch(s, p.substr(2, p.size())) || (!s.empty() && first_match && isMatch(s.substr(1, s.size()), p));
            }
            // wrong:
            // else {
            //   if (!s.empty() && first_match) {
            //     return isMatch(s.substr(1, s.size()), p);
            //   }
            //   return isMatch(s, p.substr(2, p.size()));
            // }
      }
    }
};

注意到上面的代码中，最后我注释掉了一些代码，这样做会导致类似于"aaa" "a*a"的test case跑不过去，具体原因没有细究，但感觉就是这两个条件之间的关系非常微妙，如果不符合前一个应该可以考虑后一个，而如果单独拎一个if写出来可能就不太对了？

然后下面来讲一下某公司的面经题，和上面这道lc差不多，改动/加了以下几个条件：

1. 每次匹配的时候不是匹配整个字符串，而是匹配字符串中的任意一个子字符串

2. ?（其实就是*）代表的只能是前一个字符出现了零次或一次

3. ?不能出现在pattern的第一位，且pattern中不能有连续两个?合在一起

这道题的思路和上面的相似，也采用递归的方式实现，但是由于是部分匹配，因此需要先找出首字母匹配的子字符串，然后分别对这些子字符串进行递归判断。为了准备面试，还是从头说起思路。

general idea就是，我们可以遍历整个字符串，找到和pattern的首字符匹配的位置，然后对以这个字符为起始的子字符串进行判断是否match。为了判断子字符串是否匹配，我们可以采用递归的方式实现。

首先我们先排除一些特殊情况，比如s和p均为空、或者p[0] == '?'的情况。然后我们遍历整个s字符串，如果发现当前的字符和p[0] match（包括相等和p[0] == '.'），则开始进行递归判断是否match。

对于递归条件的分类，和上面的相似，我们按照p的长度分为以下几类：

0. p.size() == 0作为base case：说明该匹配的已经匹配完了，返回true

1. p.size() == 1作为base case：如果p的长度只有1，那么我们需要保证p[0]和s[0] match

2. p.size() > 1，那么可能有*的出现：

2.1 p[1] != '*'：说明p[0]和s[0]要match，并且从p[1]和s[1]开始的子字符串也要match

2.2 p[1] == '*'：说明出现了*，那么p[0]一定不能是'*'（题目规定）

2.2.1 考虑*表示前一个字符出现一次的情况，比如s="abc", p="a*bc"，那么我们需要保证s[0] == p[0]，且s[1]开始的substring和p[2]开始的substring要match

2.2.2 考虑*表示前一个字符不出现的情况，比如s="abc", p="d*abc"，那么我们需要保证s和p[2]开始的substring要match

代码大概如下，也许或许是没啥问题的：

#include <iostream>
using namespace std;

// To execute C++, please define "int main()"

bool isMatch1(string s, string p) {
  for (int i = 0; i < s.size(); i++) {
    if (s[i] == p[0] || p[0] == '.') {
      int j;
      for (j = 1; j < p.size(); j++) {
        if (s[i + j] == p[j] || p[j] == '.') {
          continue;
        }
        else {
          break;
        } 
      }
      if (j == p.size()) {
        return true;
      }
    }
  }
  return false;
}


bool helper(string s, string p) {
  // if p.size() == 0, return true;
  // if p.size() == 1, p[0] == s[0] or p[0] == '.'
  // if p.size() > 1, 
  // p: a?bc; s: bc
  // 1. if p[1] != '?', p[0] == s[0] or p[0] == '.' && helper(s.substr(1), p.substr(1))
  // 2. if p[1] == '?', 
  // 1 not here: helper(s, p.substr(2)) 
  // if(1 ==1|| p[0]=='.') helper(s.substr(1), p.substr(2))
  if (p.size() == 0) {
    return true;
 }
  bool first_match =  p[0] == s[0] or p[0] == '.';
  if (p.size() == 1) {
    return first_match;
  }
  else {
    if (p[1] != '?') {
      return !s.empty() && first_match && helper(s.substr(1, s.size()), p.substr(1, p.size()));
    }
    else {
        return p[0] != '?' && (!s.empty() && first_match && helper(s.substr(1, s.size()), p.substr(2, p.size())) || helper(s, p.substr(2, p.size())));
    }
  }
}


bool isMatch(string s, string p) {
  if (p.empty()) {
    return s.empty();
  }
  if (p[0] == '?') {
    return false;
  }
  for (int i = 0; i < s.size(); i++) {
    if (s[i] == p[0] || p[0] == '.') {  // if "a?" matches "hello world", delete the if
      if (helper(s.substr(i, s.size()), p)) {
        return true;
      }
    }
  }
  return false;   
}


int main() {
  cout<< isMatch("hello world","hello") << endl; //1
  cout<< isMatch("hello world","HELLO") << endl; //0
  cout<< isMatch("hello world","worl") << endl;  //1
  cout<< isMatch("hello world","o wor") << endl; //1
  cout<< isMatch("hello world","o.wor") << endl; //1
  cout<< isMatch("hello world",".ll") << endl;  //1
  cout<< isMatch("hello world","hh?ello") << endl; //1
  cout<< isMatch("hello world","wo?rld") << endl; //1
  cout<< isMatch("hello world",".?ello") << endl; //1
  cout<< isMatch("hello world",".??") << endl; //0
  cout<< isMatch("hello world","?word") << endl; //0
  cout<< isMatch("hello world","hel?o") << endl; //0
  cout<< isMatch("hello world",".. w?o") << endl; //1
  cout<< isMatch("hello world",".?.?.?ld") << endl; //1
  cout<< isMatch("hello world","ee?llo") << endl; //1
  cout<< isMatch("hello world","h?llo") << endl; //0
  cout<< isMatch("hello world","h?") << endl; //1
  cout<< isMatch("hello world","e?") << endl; //1
  cout<< isMatch("hello world"," ?") << endl; //1
  cout<< isMatch("hello world",".?") << endl; //1
  cout<< isMatch("hello world","a?") << endl; //1 or 0?
  string s2 = "ab";
  string p6 = ".?c";
  cout << isMatch(s2, p6) << endl;  // 0
  string s3 = "aa";
  string p7 = "a?";
  cout << isMatch(s3, p7) << endl;  // 0
  string s4 = "aaa";
  string p8 = "a?a";
  cout << isMatch(s4, p8) << endl;  // 1
  return 0;
}

wenyq7

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
LeetCode 10. Regular Expression Matching

LeetCode 10. Regular Expression Matching(Hard) 主要知识点：递归、动态规划；优先级：4（主要是分情况讨论很烦）
复制链接

扫一扫

专栏目录