这两天研究了一下字符串中常见的两种算法。
KMP算法:求子串在主串中的位置,主要参考大佬的算法讲解,整个算法主要包括next数组的求法和kmp找子串在主串中的位置。相比于朴素字符串匹配需要不停的回溯来进行字符匹配,KMP只需要遍历主串一次,时间复杂度为O(m+n) ,m、n分别为子串和主串长度。
算法python实现如下:
def normal(a,b): #朴素字符串匹配,主串需要回溯
i = 0
j = 0
while i < len(a):
if j == len(b):
break
if a[i] == b[j]:
i += 1
j += 1
else:
j = 0
i = i - j + 1
if j == len(b):
return i-j
else:
return -1
def gene_next(p): #next数组的实现
next = [0 for i in range(len(p)+1)]
next[0] = -1
j = -1
i = 0
while i < len(p):
if j == -1 or p[i] == p[j]:
i += 1
j += 1
next[i] = j
else:
j = next[j]
return next
def kmp(a,b):
l = gene_next(b)
i = 0
j = 0
while i < len(a) and j < len(b):
if a[i] == b[j] or j == -1:
i += 1
j += 1
else:
j = l[j]
if j == len(b):
return i-j
else:
return -1
while True:
try:
a = input().strip('\n').strip('')
b = input().strip('\n').strip('')
print('Location:', normal(a,b))
print('Location:', kmp(a,b))
except:
break
manacher算法:用于求一个字符串中最长回文子串长度,具体算法详解参考大佬讲解,主要用到对称思想。由于该算法只不断向后扩展最右回文边界,不发生回溯,所以该算法时间复杂度应该为O(n)。
具体算法实现如下:
def pad(s): #对字符串进行填充
l = ''
l += '#'
for i in s:
l += i
l += '#'
print(l)
return l
def mxx(s):
c = 0
max_r = -1
p = [0 for i in range(len(s))]
for i in range(len(s)):
if i < max_r:
p[i] = min(p[2*c-i],max_r-i+1)
else:
p[i] = 1
while (i-p[i] >= 0) and (i+p[i] < len(s)) and s[i-p[i]] == s[i+p[i]]:
p[i] += 1
if i + p[i] >max_r:
c= i
max_r = i + p[i]-1
return max(p)-1
if __name__ == '__main__':
while True:
try:
s = input().strip('\n').strip(' ')
s = pad(s)
print(mxx(s))
except:
break
#时间复杂度为O(n^2)的最长回文子串长度求法
def maxx(a):
if a == a[::-1]:
return len(a)
else:
maxlen = 0
for i in range(len(a)):
low = i
high = i+1
while low >=0 and high < len(a) and a[low] == a[high]: #对应abba型
low -= 1
high += 1
if high - low - 1 > maxlen:
maxlen = high - low - 1
low = i - 1
high = i + 1
while low >=0 and high < len(a) and a[low] == a[high]: # 对应ABA型
low -= 1
high += 1
if high - low - 1 > maxlen:
maxlen = high - low - 1
return maxlen