boyer-mooer算法(BM)

Boyer-Moore算法是一种高效的字符串匹配算法,其预处理阶段时间复杂度为O(m+),空间复杂度同样为O(m+)。搜索阶段的时间复杂度为O(mn),在最坏情况下进行3n次文本字符比较,最佳性能为O(n/m)。该算法通过好后缀移位和坏字符移位两个预先计算的函数进行文本扫描。在不规则模式搜索中,通常情况下,Boyer-Moore被认为是效率最高的。
摘要由CSDN通过智能技术生成
    昨夜又翻起《柔性字符串匹配》这本书,看到horspool算法,书上一笔略过感觉不太详细,就上网看别人的文章发现一本书挺好 《Handbook of exact String-Matching Algorithms》,在看这个算法时,无意翻到了BM算法描述上,大致瞄了一下,看到BM的实现代码,这是第一本书上没有的,其实在第一本书上,这个算法也只是描述了思想,也不太详细,不过第二本书上讲的倒是挺详细的,仔细看了下,发现第一本书上有些差别。(可能比较长,别跑,后面我有写自己的分析哦;P)
Handbook of exact String-Matching Algorithms

Boyer-mooer algorithms

  • performs the comparisons from right to left;
  • preprocessing phase in O(m+sigma) time and space complexity;
  • searching phase in O(mn) time complexity;
  • 3n text character comparisons in the worst case when searching for a non periodic pattern;
  • O(n / m) best performance.

The Boyer-Moore algorithm is considered as the most efficient string-matching algorithm in usual applications. A simplified version of it or the entire algorithm is often implemented in text editors for the «search» and «substitute» commands.

The algorithm scans the characters of the pattern from right to left beginning with the rightmost one. In case of a mismatch (or a complete match of the whole pattern) it uses two precomputed functions to shift the window to the right. These two shift functions are called the good-suffix shift (also called matching shift and the bad-character shift (also called the occurrence shift).

Assume that a mismatch occurs between the character x[i]=a of the pattern and the character y[i+j]=b of the text during an attempt at position j.
Then, x[i+1 .. m-1]=y[i+j+1 .. j+m-1]=u and x[ineq y[i+j]. The good-suffix shift consists in aligning the segment y[i+j+1 .. j+m-1]=x[i+1 .. m-1] with its rightmost occurrence in x that is preceded by a character different from x[i] (see figure 13.1).

figure 13.1

Figure 13.1. The good-suffix shift, u re-occurs preceded by a character c different from a.

If there exists no such segment, the shift consists in aligning the longest suffix v of y[i+j+1 .. j+m-1] with a matching prefix of x (see figure 13.2).

figure 13.2

Figure 13.2. The good-suffix shift, only a suffix of u re-occurs in x.

The bad-character shift consists in aligning the text character y[i+j] with its rightmost occurrence in x[0 .. m-2]. (see figure 13.3)

figure 13.3

Figure 13.3. The bad-character shift, a occurs in x.

If y[i+j] does not occur in the pattern x, no occurrence of x in y can include y[i+j], and the left end of the window is aligned with the character immediately after y[i+j], namely y[i+j+1] (see figure 13.4).

figure 13.4

Figure 13.4. The bad-character shift, b does not occur in x.

Note that the bad-character shift can be negative, thus for shifting the window, the Boyer-Moore algorithm applies the maximum between the the good-suffix shift and bad-character shift. More formally the two shift functions are defined as follows.

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值