- Brute-Force
code
def find_brute(T, P):
#this function will return the lowest index of string T at which substring P begins
#else return -1
n, m = len(T), len(P)
for i in range(n-m+1): #try every potental starting index in str T
k = 0 #mark the length of the str P
while k < m and T[i+k] == P[k]: #kth character in P mathces
k+=1
if k==m: #reach the end of pattern
return i #return the lowest index of T
return -1 #find pattern failed
In this algorithm, the worst-case running time is O(nm)
- The Boyer-Moore Algorithm
In this algorithm, we use i to represent the index of the mismatched character in the text;
j represent index of the last occurrence of T[i]within the pattern; and k represent the corresponding index in the pattern.
It’s worthy to be cautious ,there are two cases when we are doing pattern-matching: whether its last occurrence is before or after the character of the pattern that was aligned with the mismatched.
for example:
j < k or j > k, we have to shift i and k in different ways.
code:
def Boyer-Moore(T,P)
n, m = len(T), len(P)
last = {}
for k in range(m) #create the hash table to represent 'last' function
last[P[k]] = k
i = m-1 #initiate i and j
k = m-1
while i < n:
if T[i] == P[k]: #if index i in text match index j in pattern
if k == 0: #the whole pattern match
return i #return the first index
else:
k -= 1
i -= 1
else:
j = last.get(T[i]) #get j
i += m-min(k, j+1)
k = m-1 #restart at the end of pattern
- The Knuth-Morris-Pratt Algorithm (KMP Algorithm)
In ‘Data Structure and Algorithm in Python’, It use a list named fail[] to help implement the KMP algorithm. It’s similar to the ‘next array’ which we may be more familiar.And we use pattern P to match itself to get the ‘fail list’
code:
def compute_kmp_fail(P):
m = len(P)
fail = [0]*m #initiate the list fail
i = 1
k = 0
while i<m:
if P[i] == P[k]
fail[i] = k+1 # k + 1 characters match thus far
k += 1
i += 1
elif k>0:
k = fail[k-1]
else: # no match found starting at i
i += 1
return fail
def KMP(T,P):
n, m = len(T), len(P)
i = 0
k = 0
while i < n:
if T[i] == P[k]:
if k == m-1 #match complete
return i-m+1
i += 1
k += 1
elif k > 0: #P[k] mismatch T[i]
k = fail[k-1] #k restart from P[fail[k-1]],reuse the suffix of P[0:k]
else:
i += 1
return -1
This algorithm archives a running time O(n+m), which is asymptotically optimal