思想也是借鉴了kmp算法,但是没有它的部分匹配表,所以性能上稍差些,主要是目标字符串中含有html标签,
将源字符串向右滑动,挨个去匹配,当匹配到时,在选取第二个字符串往后匹配,
由于目标字符串中含有标签,所以每次匹配时,遇到标签需要
/**
* 匹配字符串在文本中的位置(目标字符串是html)
*
* @param source
* @param target
*/
public static List<Integer> kmpFuzzyMatching(String source, String target) {
char[] sourceChar = source.toCharArray();
char[] targetChar = target.toCharArray();
int temp = 0;
List<Integer> pointList = new ArrayList<>();
for (int i = 0; i < sourceChar.length; i++) {
for (int j = temp; j < target.length(); j++) {
//是不是标签
if (targetChar[j] == '<') {
String tempStr = target.substring(j);
int k = tempStr.indexOf(">") + j;
if (k > 0) {
j = k;
continue;
}
}
//匹配到了
if (sourceChar[i] == targetChar[j]) {
pointList.add(j);
temp = j + 1;
break;
} else if (targetChar[j] == '\n'
|| targetChar[j] == ' ') {
continue;
} else {
//下一个字符是不是标签
if (targetChar[j + 1] == '<') {
String tempStr = target.substring(j);
int k = tempStr.indexOf(">") + j;
if (k > 0) {
j = k;
continue;
}
} else {
temp = j - i + 1;
i = -1;
pointList.clear();
break;
}
}
}
if (pointList.size() == source.length()) {
break;
}
}
if (pointList.size() == source.length()) {
if (source.length() == 1) {
return Arrays.asList(pointList.get(0), pointList.get(0) + 1);
}
return Arrays.asList(pointList.get(0), pointList.get(pointList.size() - 1) + 1);
}
return Arrays.asList(-1, -1);
}
对于结果,返回起点与终点,如果没有匹配到,就返回-1,-1。