Three kinds of Pattern-Matching Algorithm in Python

最新推荐文章于 2024-06-03 19:43:41 发布

brsmsg

最新推荐文章于 2024-06-03 19:43:41 发布

阅读量321

点赞数

分类专栏： Data Structure and Algorithm in Pyt

本文链接：https://blog.csdn.net/brsmsg/article/details/86514732

版权

Data Structure and Algorithm in Pyt 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Brute-Force
code

def find_brute(T, P):      
#this function will return the lowest index of string T at which substring P begins
#else return -1 
	n, m =  len(T), len(P)
	for i in range(n-m+1):					#try every potental starting index in str T
		k = 0								#mark the length of the str P
		while k < m and T[i+k] == P[k]:		#kth character in P mathces
			k+=1		
		if k==m:							#reach the end of pattern
			return i						#return the lowest index of T
	return -1								#find pattern failed

In this algorithm, the worst-case running time is O(nm)

The Boyer-Moore Algorithm
In this algorithm, we use i to represent the index of the mismatched character in the text;
j represent index of the last occurrence of T[i]within the pattern; and k represent the corresponding index in the pattern.
It’s worthy to be cautious ,there are two cases when we are doing pattern-matching: whether its last occurrence is before or after the character of the pattern that was aligned with the mismatched.
for example:

j < k or j > k, we have to shift i and k in different ways.
在这里插入图片描述
code:

def Boyer-Moore(T,P)
	n, m = len(T), len(P)
	last = {}
	for k in range(m)		#create the hash table to represent 'last' function
		last[P[k]] = k	
	i = m-1		#initiate i and j
	k = m-1
	while i < n:
		if T[i] == P[k]:		#if index i in text match index j in pattern
			if k == 0:		#the whole pattern match 
				return i		#return the first index
			else:
				k -= 1		
				i -= 1
		else:
			j = last.get(T[i])		#get j
			i += m-min(k, j+1)		
			k = m-1		#restart at the end of pattern

The Knuth-Morris-Pratt Algorithm (KMP Algorithm)
In ‘Data Structure and Algorithm in Python’, It use a list named fail[] to help implement the KMP algorithm. It’s similar to the ‘next array’ which we may be more familiar.And we use pattern P to match itself to get the ‘fail list’
code:

def compute_kmp_fail(P):
	m = len(P)
	fail = [0]*m	#initiate the list fail
	i = 1
	k = 0
	while i<m:
		if P[i] == P[k]
			fail[i] = k+1	# k + 1 characters match thus far 
			k += 1
			i += 1
		elif k>0:
			k = fail[k-1]
		else:		# no match found starting at i
			i += 1
	return fail

def KMP(T,P):
	n, m = len(T), len(P)
	i = 0
	k = 0
	while i < n:
		if T[i] == P[k]:
			if k == m-1		#match complete
				return i-m+1
			i += 1
			k += 1
		elif k > 0:		 #P[k] mismatch T[i]
			k = fail[k-1]		  #k restart from P[fail[k-1]],reuse the suffix of P[0:k]
		else:
			i += 1
	return -1