Suppose that pattern P and text T are randomly chosen strings of length m and n, respectively, from the d-ary alphabet Σd = {0, 1, . . . , d - 1}, where d ≥ 2. Show that the expected number of character-to-character comparisons made by the implicit loop in line 4 of the naive algorithm is
over all executions of this loop. (Assume that the naive algorithm stops comparing characters for a given shift once a mismatch is found or the entire pattern is matched.) Thus, for randomly chosen strings, the naive algorithm is quite efficient.
证明:
令
P
为单个字符比较匹配的概率,
1-P
为失配的概率
,P=d-1
比较次数
概率
比较次数
*
概率
1
1 - P 1 - P
2
P
(
1 - P
)
2P - 2P2
3
P2 (1 - P) 3P - 3P3
…
m-1 Pm-2(1 - P) (m-1)Pm-2 - (m-1) Pm-1
m Pm-1(1 - P)+Pm-1P (m)Pm-1 - (m) Pm + (m)Pm
E =
∑(
比较次数
*
概率
) = 1+P+P2+…..Pm-1 = (1-Pm)/1-P = (1- d –m )/( 1- d-1)
令
f(x) = (1- x –m )/( 1- x-1),
对
f
(
x
)求导可知,在
m>1,x>=2
时导数为负,则
f(x)
在
x>=2
严格减函数,所以
f(x)<=f(2)<=2.
证毕。
由此可知,对于随机的字符串,朴素的字符串比较还是有效的。