Boyer-Moore 字符串匹配算法，比 KMP 更快!!!-CSDN博客

本文链接：https://blog.csdn.net/qq_43664407/article/details/148082534

1. 算法简介

Boyer-Moore 算法是一种高效的字符串匹配算法，用于在文本（Text）中快速查找模式（Pattern）的位置。其核心思想是：

从右向左比较字符（与常规从左向右相反）。
利用坏字符规则（Bad Character Rule）和好后缀规则（Good Suffix Rule）跳过不必要的比较，大幅减少匹配次数。

2. 核心规则

(1) 坏字符规则（Bad Character Rule）

当发现文本（T）中的某个字符与模式（P）不匹配时：
- 若该字符在模式中不存在，则模式直接跳过该字符。
- 若该字符在模式中存在，则将模式向右移动，对齐模式中该字符的最后出现位置。

(2) 好后缀规则（Good Suffix Rule）

当发现不匹配时，若已有部分后缀匹配成功：
- 在模式中查找该后缀的其他出现位置，并移动模式对齐。
- 若不存在，则移动模式到该后缀的后面。

3. 算法步骤

预处理：
- 为模式生成坏字符表（记录每个字符最后出现的位置）。
- 生成好后缀表（记录后缀的匹配情况）。
匹配：
- 从右向左比较字符，根据规则跳过尽可能多的位置。

4. 示例分析

示例 1：坏字符主导

示例 3：综合应用

5. 算法复杂度

文本（T）：ABCABCDAB
模式（P）：ABD
匹配过程：
1. 对齐起始位置，从右向左比较：
```
ABCABCDAB
ABD
  ^ 不匹配（C vs D）
```
2. 坏字符规则：C不在模式中，模式右移3位：
```
ABCABCDAB
   ABD
```
3. 匹配成功（位置4）。
示例 2：好后缀主导
文本（T）：ABABABAC
模式（P）：ABABAC
匹配过程：
1. 对齐起始位置，从右向左比较：
```
ABABABAC
ABABAC
    ^ 不匹配（B vs C）
```
2. 好后缀规则：已匹配后缀ABA，在模式中查找其前缀ABA并对齐：
```
ABABABAC
    ABABAC
```
3. 匹配成功（位置3）。
文本（T）：GCTTCTGCTAC
模式（P）：TCTG
匹配过程：
1. 初始对齐：
```
GCTTCTGCTAC
TCTG
 ^ 不匹配（C vs T）
```
2. 坏字符规则：C在模式中最后出现位置是第2位，右移1位：
```
GCTTCTGCTAC
 TCTG
```
3. 从右向左比较，完全匹配（位置3）。
最坏情况：O(n/m)（n为文本长度，m为模式长度）。
最佳情况：O(n/m)（如坏字符每次跳过整个模式长度）。

6.C++17 引入的 `std::boyer_moore_searcher`

如果允许使用 C++17，匹配字符串时候可以使用 std::boyer_moore_searcher（基于 Boyer-Moore 算法，比 KMP 更快）：

#include<bits/stdc++.h>
using namespace std;

int main() {
    std::string text = "ABABDABACDABABCABAB";
    std::string pattern = "ABABCABAB";

    auto searcher = boyer_moore_searcher(pattern.begin(), pattern.end());
    auto it = search(text.begin(), text.end(), searcher);

    if (it != text.end()) {
        cout << "Pattern found at index: " << (it - text.begin()) << endl;
    } else {
        cout << "Pattern not found." << endl;
    }
    return 0;
}

7.使用 `std::boyer_moore_searcher` 找出所有匹配位置

#include <bits/stdc++.h>
using namespace std;

vector<size_t> findAllMatches(const string &text, const string &pattern) {
    vector<size_t> matches;
    if (pattern.empty()) return matches;

    auto searcher = boyer_moore_searcher(pattern.begin(), pattern.end());
    auto it = text.begin();
    while (it != text.end()) {
        auto match = search(it, text.end(), searcher);
        if (match == text.end()) break; // 没有更多匹配

        size_t pos = match - text.begin();
        matches.push_back(pos);
        it = match + 1; // 继续搜索下一个可能的匹配
    }

    return matches;
}

int main() {
    string text = "ABABDABACDABABCABAB";
    string pattern = "ABAB";
    vector<size_t> matches = findAllMatches(text, pattern);

    for (size_t pos : matches) {
        cout << "Found at index: " << pos << endl;
    }
    return 0;
}

字符串匹配算法总结：

方法	时间复杂度	适用场景
`std::string::find()`	O(n*m)	简单搜索，适用于短字符串
KMP 算法	O(n+m)	需要高效匹配时（手动实现）
`std::boyer_moore_searcher` (C++17)	O(n/m)	适用于长字符串，性能最好