boyer-mooer算法(BM)

最新推荐文章于 2024-08-20 17:09:37 发布

WYlslrt

最新推荐文章于 2024-08-20 17:09:37 发布

阅读量3.5k

点赞数

分类专栏：算法文章标签：算法 character algorithm table function include

本文链接：https://blog.csdn.net/WYlslrt/article/details/3499916

版权

Boyer-Moore算法是一种高效的字符串匹配算法，其预处理阶段时间复杂度为O(m+)，空间复杂度同样为O(m+)。搜索阶段的时间复杂度为O(mn)，在最坏情况下进行3n次文本字符比较，最佳性能为O(n/m)。该算法通过好后缀移位和坏字符移位两个预先计算的函数进行文本扫描。在不规则模式搜索中，通常情况下，Boyer-Moore被认为是效率最高的。

摘要由CSDN通过智能技术生成

昨夜又翻起《柔性字符串匹配》这本书，看到horspool算法，书上一笔略过感觉不太详细，就上网看别人的文章发现一本书挺好《Handbook of exact String-Matching Algorithms》，在看这个算法时，无意翻到了BM算法描述上，大致瞄了一下，看到BM的实现代码，这是第一本书上没有的，其实在第一本书上，这个算法也只是描述了思想，也不太详细，不过第二本书上讲的倒是挺详细的，仔细看了下，发现第一本书上有些差别。(可能比较长，别跑，后面我有写自己的分析哦；P)

Handbook of exact String-Matching Algorithms

Boyer-mooer algorithms

Main features

performs the comparisons from right to left;
preprocessing phase in O(m+) time and space complexity;
searching phase in O(mn) time complexity;
3n text character comparisons in the worst case when searching for a non periodic pattern;
O(n / m) best performance.

Description

The Boyer-Moore algorithm is considered as the most efficient string-matching algorithm in usual applications. A simplified version of it or the entire algorithm is often implemented in text editors for the «search» and «substitute» commands.

The algorithm scans the characters of the pattern from right to left beginning with the rightmost one. In case of a mismatch (or a complete match of the whole pattern) it uses two precomputed functions to shift the window to the right. These two shift functions are called the good-suffix shift (also called matching shift and the bad-character shift (also called the occurrence shift).

Assume that a mismatch occurs between the character x[i]=a of the pattern and the character y[i+j]=b of the text during an attempt at position j.
Then, x[i+1 .. m-1]=y[i+j+1 .. j+m-1]=u and x[i] neq y[i+j]. The good-suffix shift consists in aligning the segment y[i+j+1 .. j+m-1]=x[i+1 .. m-1] with its rightmost occurrence in x that is preceded by a character different from x[i] (see figure 13.1).

figure 13.1

Figure 13.1. The good-suffix shift, u re-occurs preceded by a character c different from a.

If there exists no such segment, the shift consists in aligning the longest suffix v of y[i+j+1 .. j+m-1] with a matching prefix of x (see figure 13.2).

figure 13.2

Figure 13.2. The good-suffix shift, only a suffix of u re-occurs in x.

The bad-character shift consists in aligning the text character y[i+j] with its rightmost occurrence in x[0 .. m-2]. (see figure 13.3)

figure 13.3

Figure 13.3. The bad-character shift, a occurs in x.

If y[i+j] does not occur in the pattern x, no occurrence of x in y can include y[i+j], and the left end of the window is aligned with the character immediately after y[i+j], namely y[i+j+1] (see figure 13.4).

figure 13.4

Figure 13.4. The bad-character shift, b does not occur in x.

Note that the bad-character shift can be negative, thus for shifting the window, the Boyer-Moore algorithm applies the maximum between the the good-suffix shift and bad-character shift. More formally the two shift functions are defined as follows.

最低0.47元/天解锁文章

WYlslrt

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
1
评论
boyer-mooer算法(BM)

昨夜又翻起《柔性字符串匹配》这本书，看到horspool算法，书上一笔略过感觉不太详细，就上网看别人的文章发现一本书挺好《Handbook of exact String-Matching Algorithms》，在看这个算法时，无意翻到了BM算法描述上，大致瞄了一下，看到BM的实现代码，这是第一本书上没有的，其实在第一本书上，这个算法也只是描述了思想，也不太详细，不过第二本书上讲的倒是挺
复制链接

扫一扫

专栏目录