黑马程序员-算法-字符串比较JAVA自带BF算法、KMP算法

最新推荐文章于 2024-06-26 20:12:37 发布

ll14569

最新推荐文章于 2024-06-26 20:12:37 发布

阅读量284

点赞数

本文链接：https://blog.csdn.net/ll14569/article/details/8185052

版权

------- android培训、 java培训、期待与您交流！ ----------

java中自带的算法，也被称为BF（Brute Force）算法：

/**
 * Code shared by String and StringBuffer to do searches. The source is the
 * character array being searched, and the target is the string being
 * searched for.
 * 
 * String类和StringBuffer类共享的用于搜索的代码。源字符数组中搜索目标字符数组。
 * 
 * @param source
 *            the characters being searched.用于搜索的源字符串。
 * @param sourceOffset
 *            offset of the source string.源字符串偏移量。
 * @param sourceCount
 *            count of the source string.源字符串长度。
 * @param target
 *            the characters being searched for.被搜索的目标字符串。
 * @param targetOffset
 *            offset of the target string.目标字符串偏移量。
 * @param targetCount
 *            count of the target string.目标字符串长度。
 * @param fromIndex
 *            the index to begin searching from.从该位置开始搜索。
 */
static int indexOf(char[] source, int sourceOffset, int sourceCount,
		char[] target, int targetOffset, int targetCount, int fromIndex) {
	if (fromIndex >= sourceCount) {
		return (targetCount == 0 ? sourceCount : -1);
	}
	if (fromIndex < 0) {
		fromIndex = 0;
	}
	if (targetCount == 0) {
		return fromIndex;
	}

	char first = target[targetOffset];
	int max = sourceOffset + (sourceCount - targetCount);

	for (int i = sourceOffset + fromIndex; i <= max; i++) {
		/* Look for first character. */
		/* 找第一个相同的字符 */
		if (source[i] != first) {
			while (++i <= max && source[i] != first)
				;
		}

		/* Found first character, now look at the rest of v2 */
		/* 找到第一个字符，就看剩下的部分了 */
		if (i <= max) {
			int j = i + 1;
			int end = j + targetCount - 1;
			for (int k = targetOffset + 1; j < end
					&& source[j] == target[k]; j++, k++)
				;

			if (j == end) {
				/* Found whole string. */
				/* 找到整个字符串 */
				return i - sourceOffset;
			}
		}
	}
	return -1;
}

这是Java String类中的indexOf最后调用的搜索算法，对于源字符串使用了1个指针和目标字符串使用1个指针来对比进行搜索。这个自带的算法每次搜索不匹配的时候，源字符串指针和目标字符串指针都会回到源字符串指针+1的位置，有回溯，算法复杂度为O(mn)。

就像这样在字符串abcabcabcd中搜索abcd：

//开始
0 1 2 3 4 5 6 7 8 9 A B C D
a b c a b c a b c d e f g h
a b c a b c d
^
//第一次不匹配
0 1 2 3 4 5 6 7 8 9 A B C D
a b c a b c a b c d e f g h
a b c a b c d
            ^
//回溯
0 1 2 3 4 5 6 7 8 9 A B C D
a b c a b c a b c d e f g h
  a b c a b c d
  ^

KMP算法：

kmp算法是一种改进的字符串匹配算法，由D.E.Knuth与V.R.Pratt和J.H.Morris同时发现，因此人们称它为克努特——莫里斯——普拉特操作（简称KMP算法）。

KMP算法的核心就是消除字符串指针回溯。利用这点：对于回溯的字符，其实都是已知的。比如在"abcabcabcd"中搜索"abcabcd"，前6个字符"abcabc"都是匹配的，第7个字符a和d不匹配。知道是在第7个字符不匹配，那就说明前6个字符都是匹配的，从而说明“知道回溯之后的字符是什么”，对于这个例子来说，我们肯定知道源字符串前面6个字符是"abcabc"。这是KMP搜索的根基。

//开始
0 1 2 3 4 5 6 7 8 9 A B C D
a b c a b c a b c d e f g h
a b c a b c d
^
//第一次不匹配
0 1 2 3 4 5 6 7 8 9 A B C D
a b c a b c a b c d e f g h
a b c a b c d
            ^
//匹配下一个
0 1 2 3 4 5 6 7 8 9 A B C D
a b c a b c a b c d e f g h
      a b c a b c d
            ^

因此KMP算法的关键在于求算next[]数组的值，即求算模式串每个位置处的最长后缀与前缀相同的长度，而求算next[]数组的值有两种思路，第一种思路是用递推的思想去求算，还有一种就是直接去求解。
1.按照递推的思想：
根据定义next[0]=-1，假设next[j]=k, 即P[0...k-1]==P[j-k,j-1]
1)若P[j]==P[k]，则有P[0..k]==P[j-k,j]，很显然，next[j+1]=next[j]+1=k+1;
2)若P[j]!=P[k]，则可以把其看做模式匹配的问题，即匹配失败的时候，k值如何移动，显然k=next[k]。
因此可以这样去实现：

public class IndexOf {
	private static int[] get_Next(String str) {
		int[] next = null;
		if (str.length() == 1) {
			next = new int[] { 0 };
		} else {
			char[] c = str.toCharArray();
			next = new int[c.length];
			next[0] = -1;
			int i = 0, j = -1;
			while (i < c.length - 1) {
				if (j == -1 || c[i] == c[j]) {
					i++;
					j++;
					next[i] = j;
				} else {
					j = next[j];
				}
			}
		}
		return next;
	}
	
	public static int indexOfKMP(String source, String target) {
		int index = -1;
		if (target == null || target.length() == 0
				|| target.length() > source.length()) {
			return 0;
		} else {
			int i = 0, j = 0;
			char[] sChars = source.toCharArray();
			char[] tChars = target.toCharArray();
			int jMax = tChars.length - 1;
			int iMax = sChars.length - j;
			int[] next = get_Next(target);
			while (i < iMax && j < jMax) {
				if (j == 0 || sChars[i] == tChars[j]) {
					++i;
					++j;
				} else {
					j = next[j];
				}
			}
			if (j >= tChars.length - 1)
				index = i - j;
		}
		return index;
	}
}

KMP的代价：

用空间换取时间：用一组长度为目标字符串长度的int数组来辅助判断，将算法复杂度为O(mn)转化为O(m+n)。

kmp算法其它写得更详细的blog

http://www.cnblogs.com/dolphin0520/archive/2011/08/24/2151846.html

http://www.matrix67.com/blog/archives/115/