AC自动机算法笔记

  AC算法是Alfred V.Aho(《编译原理》(龙书)的作者),和Margaret J.Corasick于1974年提出(与KMP算法同年)的一个经典的多模式匹配算法,可以保证对于给定的长度为n的文本,和模式集合 P{p1,p2,...pm} ,在O(n)时间复杂度内,找到文本中的所有目标模式,而与模式集合的规模m无关.
  AC算法从某种程度上可以说是KMP算法在多模式环境下的扩展。

KMP 算法简述

  对于模式串而言,其前缀,有可能也是模式串中的非前缀的子串,而且这里找的是最大前缀,非前缀可能包含多个前缀
  在KMP算法中有个数组,叫做前缀数组,也有的叫next数组,发现不匹配,下一步模式(pattern)串匹配目标(target)串的模式串的位置,它记录着字符串匹配过程中失配情况下,模式串可以向前跳几个字符,当然它描述的也是子串的对称程度,程度越高,值越大,当然之前可能出现再匹配的机会就更大。

示例1

序号0123456789
patternabcabcacab
next0001234012

示例2

序号0123456789101112131415
patternagctagcagctagctg
next0000123123456740

示例2中,a g c t a g c,包含两个前缀。对于t,其next一定小于其前面c的next。

AC自动机算法

  AC are determined by three functions:goto function ,failure function,output function

Keyword Tree

A keyword tree (or a trie ) for a set of patterns P is a rooted tree K such that

  1. each edge of K is labeled by a char acter
  2. any two edges out of a node have diferent labels
    Define the label of a node v as the concatenation of edge labels on the path from the root to v , and denote it by L(v)
  3. for each pP there’s a node v with L(v)=P , and
  4. the label L(v) of any leaf v equals some pP

A keyword tree for P ={he,she,his,hers}
这里写图片描述

goto function

States: nodes of the keyword tree
initial state: 0 = the root
the goto function g(q;a)gives the state entered from current state q by matching target char a

  1. if edge (q;v)is labeled by a , then g(q;a)=v;