Three kinds of Pattern-Matching Algorithm in Python

  1. Brute-Force
    code
def find_brute(T, P):      
#this function will return the lowest index of string T at which substring P begins
#else return -1 
	n, m =  len(T), len(P)
	for i in range(n-m+1):					#try every potental starting index in str T
		k = 0								#mark the length of the str P
		while k < m and T[i+k] == P[k]:		#kth character in P mathces
			k+=1		
		if k==m:							#reach the end of pattern
			return i						#return the lowest index of T
	return -1								#find pattern failed

In this algorithm, the worst-case running time is O(nm)

  1. The Boyer-Moore Algorithm
    In this algorithm, we use i to represent the index of the mismatched character in the text;
    j represent index of the last occurrence of T[i]within the pattern; and k represent the corresponding index in the pattern.
    It’s worthy to be cautious ,there are two cases when we are doing pattern-matching: whether its last occurrence is before or after the character of the pattern that was aligned with the mismatched.
    for example:

j < k or j > k, we have to shift i and k in different ways.
在这里插入图片描述
code:

def Boyer-Moore(T,P)
	n, m = len(T), len(P)
	last = {}
	for k in range(m)		#create the hash table to represent 'last' function
		last[P[k]] = k	
	i = m-1		#initiate i and j
	k = m-1
	while i < n:
		if T[i] == P[k]:		#if index i in text match index j in pattern
			if k == 0:		#the whole pattern match 
				return i		#return the first index
			else:
				k -= 1		
				i -= 1
		else:
			j = last.get(T[i])		#get j
			i += m-min(k, j+1)		
			k = m-1		#restart at the end of pattern
  1. The Knuth-Morris-Pratt Algorithm (KMP Algorithm)
    In ‘Data Structure and Algorithm in Python’, It use a list named fail[] to help implement the KMP algorithm. It’s similar to the ‘next array’ which we may be more familiar.And we use pattern P to match itself to get the ‘fail list’
    code:
def compute_kmp_fail(P):
	m = len(P)
	fail = [0]*m	#initiate the list fail
	i = 1
	k = 0
	while i<m:
		if P[i] == P[k]
			fail[i] = k+1	# k + 1 characters match thus far 
			k += 1
			i += 1
		elif k>0:
			k = fail[k-1]
		else:		# no match found starting at i
			i += 1
	return fail

def KMP(T,P):
	n, m = len(T), len(P)
	i = 0
	k = 0
	while i < n:
		if T[i] == P[k]:
			if k == m-1		#match complete
				return i-m+1
			i += 1
			k += 1
		elif k > 0:		 #P[k] mismatch T[i]
			k = fail[k-1]		  #k restart from P[fail[k-1]],reuse the suffix of P[0:k]
		else:
			i += 1
	return -1		
			

This algorithm archives a running time O(n+m), which is asymptotically optimal

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值