字符串匹配,通配符匹配,很常见的一个功能,但是却一直没花时间去做。
今天自己在想,做字符串处理,想到的方法有完全遍历,KMP,还有BM,不过由于长期做业务,算法的东西,都忘光了,想来想去还是上网找个现成的吧。不想找了了一个大神的完全遍历法,虽然和我想的思想一样,但是不得不佩服大神代码的功底,确实不一样啊。
原文: http://blog.csdn.net/zzran/article/details/8905760
C++代码
#include<iostream>
#include<string>
using namespace std;
bool match(char *pattern, char *content) {
// if we reatch both end of two string, we are done
if ('\0' == *pattern && '\0' == *content)
return true;
/* make sure that the characters after '*' are present in second string.
this function assumes that the first string will not contain two
consecutive '*'*/
if ('*' == *pattern && '\0' != *(pattern + 1) && '\0' == *content)
return false;
// if the first string contains '?', or current characters of both
// strings match
if ('?' == *pattern || *pattern == *content)
return match(pattern + 1, content + 1);
/* if there is *, then there are two possibilities
a) We consider current character of second string
b) We ignore current character of second string.*/
if ('*' == *pattern)
return match(pattern + 1, content) || match(pattern, content + 1);
return false;
}
void test(char *pattern, char *content) {
if (NULL == pattern || NULL == content)
puts("no");
match(pattern, content) ? puts("yes") : puts("no");
}
int main(int argc, char *argv[]) {
test("g*ks", "geeks"); // Yes
test("ge?ks*", "geeksforgeeks"); // Yes
test("g*k", "gee"); // No because 'k' is not in second
test("*pqrs", "pqrst"); // No because 't' is not in first
test("abc*bcd", "abcdhghgbcd"); // Yes
test("abc*c?d", "abcd"); // No because second must have 2 instances of 'c'
test("*c*d", "abcd"); // Yes
test("*?c*d", "abcd"); // Yes
cin.get();
return 0;
}
完全修改成Java代码
package com.obite.test;
/**
* Created by peaches on 2016-11-09.
*/
public class WildcardCharacter {
static boolean match(String pattern, String content, int p, int c) {
// if we reatch both end of two string, we are done
if (pattern.length() == p && content.length() == c)
return true;
/* make sure that the characters after '*' are present in second string.
this function assumes that the first string will not contain two
consecutive '*'*/
if (pattern.length() > p && '*' == pattern.charAt(p) && pattern.length() > p + 1 && content.length() == c)
return false;
// if the first string contains '?', or current characters of both
// strings match
if (pattern.length() > p && content.length() > c && ('?' == pattern.charAt(p) || pattern.charAt(p) == content.charAt(c)))
return match(pattern, content, p + 1, c + 1);
/* if there is *, then there are two possibilities
a) We consider current character of second string
b) We ignore current character of second string.*/
if (pattern.length() > p && '*' == pattern.charAt(p))
return match(pattern, content, p + 1, c) || match(pattern, content, p, c + 1);
return false;
}
static void test(String pattern, String content) {
if (null == pattern || null == content)
System.out.println("no");
System.out.println(match(pattern, content, 0, 0) ? "yes" : "no");
}
public static void main(String[] args) {
test("g*ks", "geeks"); // Yes
test("ge?ks*", "geeksforgeeks"); // Yes
test("g*k", "gee"); // No because 'k' is not in second
test("*pqrs", "pqrst"); // No because 't' is not in first
test("abc*bcd", "abcdhghgbcd"); // Yes
test("abc*c?d", "abcd"); // No because second must have 2 instances of 'c'
test("*c*d", "abcd"); // Yes
test("*?c*d", "abcd"); // Yes
test("*?c***d", "abcd"); // Yes
test("ge?ks**", "geeks"); // Yes
}
}
模拟C++修改的Java代码
package com.obite.test;
/**
* Created by peaches on 2016-11-09.
*/
public class WildcardCharacter2 {
static boolean match(String pattern, String content, int p, int c) {
// if we reatch both end of two string, we are done
if ('\0' == pattern.charAt(p) && '\0' == content.charAt(c))
return true;
/* make sure that the characters after '*' are present in second string.
this function assumes that the first string will not contain two
consecutive '*'*/
if ('*' == pattern.charAt(p) && '\0' != pattern.charAt(p + 1) && '\0' == content.charAt(c))
return false;
// if the first string contains '?', or current characters of both
// strings match
if ('?' == pattern.charAt(p) || pattern.charAt(p) == content.charAt(c))
return match(pattern, content, p + 1, c + 1);
/* if there is *, then there are two possibilities
a) We consider current character of second string
b) We ignore current character of second string.*/
if ('*' == pattern.charAt(p))
return match(pattern, content, p + 1, c) || match(pattern, content, p, c + 1);
return false;
}
static void test(String pattern, String content) {
if (null == pattern || null == content)
System.out.println("no");
System.out.println(match(pattern + '\0', content + '\0', 0, 0) ? "yes" : "no");
}
public static void main(String[] args) {
test("g*ks", "geeks"); // Yes
test("ge?ks*", "geeksforgeeks"); // Yes
test("g*k", "gee"); // No because 'k' is not in second
test("*pqrs", "pqrst"); // No because 't' is not in first
test("abc*bcd", "abcdhghgbcd"); // Yes
test("abc*c?d", "abcd"); // No because second must have 2 instances of 'c'
test("*c*d", "abcd"); // Yes
test("*?c*d", "abcd"); // Yes
test("*?c***d", "abcd"); // Yes
test("ge?ks**", "geeks"); // Yes
}
}
记:大神的代码并没有任何问题,但是在转成java代码时有一些小问题,记录在此:
1. c/c++ 可以直接读文件结尾,而java不能,所以需要每一个位置都判断好字符串的长度,避免超出长度错误
2. 由于程序设定不同,c/c++的字符串结尾处就是一个”\0”,而java并没有,但是我们可以给java的字符串最后加上一个”\0”以实现类似的功能。这其实并不限制什么字符,只要能确定不会在字符串中出现的字符都可以。用以使算法的实现更加简洁。