LeetCode - Hard - 10. Regular Expression Matching

Topic

  • String
  • Dynamic Programming
  • Backtracking

Description

https://leetcode.com/problems/regular-expression-matching/

Given an input string (s) and a pattern (p), implement regular expression matching with support for '.' and '*' where:

  • '.' Matches any single character.
  • '*' Matches zero or more of the preceding element.
    The matching should cover the entire input string (not partial).

Example 1:

Input: s = "aa", p = "a"
Output: false
Explanation: "a" does not match the entire string "aa".

Example 2:

Input: s = "aa", p = "a*"
Output: true
Explanation: '*' means zero or more of the preceding element, 'a'. Therefore, by repeating 'a' once, it becomes "aa".

Example 3:

Input: s = "ab", p = ".*"
Output: true
Explanation: ".*" means "zero or more (*) of any character (.)".

Example 4:

Input: s = "aab", p = "c*a*b"
Output: true
Explanation: c can be repeated 0 times, a can be repeated 1 time. Therefore, it matches "aab".

Example 5:

Input: s = "mississippi", p = "mis*is*p*."
Output: false

Constraints:

  • 0 <= s.length <= 20
  • 0 <= p.length <= 30
  • s contains only lowercase English letters.
  • p contains only lowercase English letters, '.', and '*'.
  • It is guaranteed for each appearance of the character '*', there will be a previous valid character to match.

Analysis

方法一:动态规划

Consider following example

s='aab', p='c*a*b'

      c * a * b
    0 1 2 3 4 5
  0 y
a 1
a 2
b 3

dp[i][j] denotes if s.substring(0,i) is valid for pattern p.substring(0,j). For example dp[0][0] == true (denoted by y in the matrix) because when s and p are both empty they match. So if we somehow base dp[i+1][j+1] on previos dp[i][j]'s then the result will be dp[s.length()][p.length()]

So what about the first column? for and empty pattern p="" only thing that is valid is an empty string s="" and that is already our dp[0][0] which is true. That means rest of dp[i][0] is false.

s='aab', p='c*a*b'

      c * a * b
    0 1 2 3 4 5
  0 y
a 1 n
a 2 n
b 3 n

What about the first row? In other words which pattern p matches empty string s=""? The answer is either an empty pattern p="" or a pattern that can represent an empty string such as p="a*", p="z*" or more interestingly a combiation of them as in p="a*b*c*". Below for loop is used to populate dp[0][j]. Note how it uses previous states by checking dp[0][j-2]

for (int j=2; j<=p.length(); j++) {
	dp[0][j] = p.charAt(j-1) == '*' && dp[0][j-2];
}

At this stage our matrix has become as follows: Notice dp[0][2] and dp[0][4] are both true because p="c*" and p="c*a*" can both match an empty string.

s='aab', p='c*a*b'

      c * a * b
    0 1 2 3 4 5
  0 y n y n y n
a 1 n
a 2 n
b 3 n

So now we can start our main iteration. It is basically the same, we will iterate all possible s lengths (i) for all possible p lengths (j) and we will try to find a relation based on previous results. Turns out there are two cases.

  1. (p.charAt(j-1) == s.charAt(i-1) || p.charAt(j-1) == '.') if the current characters match or pattern has . then the result is determined by the previous state dp[i][j] = dp[i-1][j-1]. Don’t be confused by the charAt(j-1) charAt(i-1) indexes using a -1 offset that is because our dp array is actually one index bigger than our string and pattern lenghts to hold the initial state dp[0][0].

  2. ifp.charAt(j-1) == '*' then either it acts as an empty set and the result is dp[i][j] = dp[i][j-2] or (s.charAt(i-1) == p.charAt(j-2) || p.charAt(j-2) == '.') current char of string equals the char preceding * in pattern so the result is dp[i-1][j].

So here is the final state of matrix after we evaluate all elements:

s='aab', p='c*a*b'

      c * a * b
    0 1 2 3 4 5
  0 y n y n y n
a 1 n n n y y n
a 2 n n n n y n
b 3 n n n n n y

Time and space complexity are O(p.length() * s.length()).

Try to evaluate the matrix by yourself if it is still confusing,


方法二:递归

There are two cases to consider:

First, the second character of p is *, now p string can match any number of character before *. if(isMatch(s, p.substring(2)) means we can match the remaining s string, otherwise, we check if the first character matches or not.

Second, if the second character is not *, we need match character one by one.

Submission

public class RegularExpressionMatching {
	//方法一:动态规划
	public boolean isMatch1(String s, String p) {
		if (p == null || p.length() == 0)
			return (s == null || s.length() == 0);

		boolean dp[][] = new boolean[s.length() + 1][p.length() + 1];
		dp[0][0] = true;
		for (int j = 2; j <= p.length(); j++) {
			dp[0][j] = p.charAt(j - 1) == '*' && dp[0][j - 2];
		}

		for (int j = 1; j <= p.length(); j++) {
			for (int i = 1; i <= s.length(); i++) {
				if (p.charAt(j - 1) == s.charAt(i - 1) || p.charAt(j - 1) == '.')
					dp[i][j] = dp[i - 1][j - 1];
				else if (p.charAt(j - 1) == '*')
					dp[i][j] = dp[i][j - 2]
							|| ((s.charAt(i - 1) == p.charAt(j - 2) || p.charAt(j - 2) == '.') && dp[i - 1][j]);
			}
		}

		return dp[s.length()][p.length()];
	}

	//方法二:递归
	public boolean isMatch2(String s, String p) {
		if (p.length() == 0) {
			return s.length() == 0;
		}
		if (p.length() > 1 && p.charAt(1) == '*') { // second char is '*'
			if (isMatch2(s, p.substring(2))) {
				return true;
			}
			if (s.length() > 0 && (p.charAt(0) == '.' || s.charAt(0) == p.charAt(0))) {
				return isMatch2(s.substring(1), p);
			}
			return false;
		} else { // second char is not '*'
			if (s.length() > 0 && (p.charAt(0) == '.' || s.charAt(0) == p.charAt(0))) {
				return isMatch2(s.substring(1), p.substring(1));
			}
			return false;
		}
	}

}

Test

import static org.junit.Assert.*;
import org.junit.Test;

public class RegularExpressionMatchingTest {

	@Test
	public void test() {
		RegularExpressionMatching obj = new RegularExpressionMatching();

		assertFalse(obj.isMatch1("aa", "a"));
		assertTrue(obj.isMatch1("aa", "a*"));
		assertTrue(obj.isMatch1("ab", ".*"));
		assertTrue(obj.isMatch1("aab", "c*a*b"));
		assertFalse(obj.isMatch1("mississippi", "mis*is*p*."));
		
		assertFalse(obj.isMatch2("aa", "a"));
		assertTrue(obj.isMatch2("aa", "a*"));
		assertTrue(obj.isMatch2("ab", ".*"));
		assertTrue(obj.isMatch2("aab", "c*a*b"));
		assertFalse(obj.isMatch2("mississippi", "mis*is*p*."));
		
	}
}
LeetCode-Editor是一种在线编码工具,它提供了一个用户友好的界面编写和运行代码。在使用LeetCode-Editor时,有时候会出现乱码的问题。 乱码的原因可能是由于编码格式不兼容或者编码错误导致的。在这种情况下,我们可以尝试以下几种解决方法: 1. 检查文件编码格式:首先,我们可以检查所编辑的文件的编码格式。通常来说,常用的编码格式有UTF-8和ASCII等。我们可以将编码格式更改为正确的格式。在LeetCode-Editor中,可以通过界面设置或编辑器设置来更改编码格式。 2. 使用正确的字符集:如果乱码是由于使用了不同的字符集导致的,我们可以尝试更改使用正确的字符集。常见的字符集如Unicode或者UTF-8等。在LeetCode-Editor中,可以在编辑器中选择正确的字符集。 3. 使用合适的编辑器:有时候,乱码问题可能与LeetCode-Editor自身相关。我们可以尝试使用其他编码工具,如Text Editor、Sublime Text或者IDE,看是否能够解决乱码问题。 4. 查找特殊字符:如果乱码问题只出现在某些特殊字符上,我们可以尝试找到并替换这些字符。通过仔细检查代码,我们可以找到导致乱码的特定字符,并进行修正或替换。 总之,解决LeetCode-Editor乱码问题的方法有很多。根据具体情况,我们可以尝试更改文件编码格式、使用正确的字符集、更换编辑器或者查找并替换特殊字符等方法来解决这个问题。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值