字符串算法回忆~

最新推荐文章于 2023-07-07 16:22:51 发布

wuzhuangtai00

最新推荐文章于 2023-07-07 16:22:51 发布

阅读量381

点赞数

分类专栏： KMP 后缀自动机 RMQ 2014 AC自动机最小表示法后缀树 LCA 后缀数组 Tarjan缩点扩展KMP

本文链接：https://blog.csdn.net/wuzhuangtai00/article/details/38619907

版权

2014 同时被 3 个专栏收录

61 篇文章 0 订阅

订阅专栏

Tarjan缩点

6 篇文章 0 订阅

订阅专栏

ZHZX集训

4 篇文章 0 订阅

订阅专栏

本来都把这茬给忘了，看到别人在写日记于是突然想起来自己已经有6天的日记没写了囧
于是赶紧补上
觉得我百度云里的几篇讲稿很不错~都是看它们看懂的//严重推荐那篇后缀树和后缀自动机的讲稿严重吐槽WJMZBMR的PPT，魂淡我这个蒟蒻看不懂
好吧最近几天学了很多的东西~包括思维深度的练习和思维广度的练习~想到自己思维广度还是差太多，于是就先把思维深度放到了一边
回去有时间再搞模拟题吧，，现在至少没时间了~
最近学了不少东西，于是这个就当做自己的总结，回忆吧~
1.最小表示法
什么是最小表示法呢。。
就是把这个字符串绕成一个环，然后从某处断开，使他的字典序最小~
可以做到O(N)的级别~
首先，两个指针i,j
i代表当前找到的最小值
j代表当前想要比较的那个
那么，，
如果 a[i]>a[j] 那么把i赋成j，并j++
如果 a[i]<a[j] 那么j++
如果 a[i]=a[j] 那么不断的比较下去直到出现以上两种情况
具体的看代码把//语文渣觉得用语言描述的不是很清楚
Code：
procedure main;
begin
i:=1; j:=2; k:=0;
while (i<=len) and (j<=len) and (k<len) do
begin
if i=j then inc(j);
ni:=(i+k-1) mod len+1;
nj:=(j+k-1) mod len+1;
if a[ni]>a[nj] then begin
inc(i,k+1);
k:=0
end;
if a[ni]<a[nj] then begin
inc(j,k+1);
k:=0;
end;
if a[ni]=a[nj] then inc(k);
end;
end;

2.KMP
这几天学下来突然觉得它不是那么难了//至少和后缀数组，后缀树，后缀自动机等等相比
暴力匹配是O(MN)
但注意到这其中有很多浪费的东西，我们应该对这些东西加以利用
这也是很多字符串算法的精髓
两个指针i,j
代表A串已经匹配到了a[i],b串已经匹配到了b[j];
即a[i-j+1..i]=b[1..j]
然后用一个next数组代表如果b串匹配到j时失败那么应该回退到哪位
其实就是求b[1..k]=b[j-k+1..j]中k的最大值
这样的话可以做到0(M+N)的复杂度
额。。其实求next数组的时候其实就是对自身的一个匹配
具体的看代码把//语文渣觉得用语言描述的不是很清楚
Code：
procedure KMP;
begin
next[1]:=0;
j:=0;
for i:=2 to n do
begin
while (j>0) and (b[i]<>b[j+1]) do j:=next[j];
if b[i]=b[j+1] then inc(j);
next[i]:=j;
end;
j:=0;
for i:=1 to n do
begin
while (j>0) and (a[i]<>b[j+1]) do j:=next[j];
if a[i]=b[j+1] then inc(j);
if j=len then begin
inc(ans)//ans即出现次数
j:=next[j];
end;
end;
end;
3.扩展KMP
算是KMP的一种扩展吧，a是模板串，b是子串，expand[i]代表a[i..len(a)]与b的最长公共前缀
既然是它的扩展，那么思想肯定差不多~
设next[i]代表b[1..k]=b[i..i+k-1]中k的最大值//其实就是b串对自身的匹配~
而我已经求出了expand[1..k-1]，现在我要求expand[k];
设我之前在匹配过程中最远匹配到了p，//即expand[i]+i-1(i in [1..k-1])中的最大值
设取到这个p的是a
即expand[a]+a-1=p;
那么设next[k-a+1]=L
如果L+K<P那么next[k]=next[k-a+1]//至于具体为什么。。那篇讲稿上有。。
否则从P开始不断的向前找，直到失配
这就是B串和B串的匹配过程
A串和B串的匹配过程一样，，所以就不说了；
Code：//貌似自己之前还没写过呢
procedure expandKMP;
begin
j:=0;
while b[j+1]=b[j+2] do inc(j);
next[1]:=lenb; next[2]:=j; k:=2;
for i:=3 to lenb do
begin
p:=k+next[k]-1; L:=next[i-k+1];
if L<p-i+1 then next[i]:=L
else begin
j:=max(0,p-i+1);
while (b[i+j]=b[1+j]) and (i+J<=lenb) do inc(j);
next[i]:=j;
k:=i;
end;
end;
j:=0;
while (a[j+1]=b[j+1]) and (j+1<=lenb) and (j+1<=lena) do inc(j);
expand[1]:=j; k:=1;
for i:=2 to lena do
begin
p:=k+expand[k]-1; L:=next[i-k+1];
if L<p-i+1 then expand[i]:=L;
else begin
j:=max(0,p-i+1);
while (a[i+j]=b[1+j]) and (1+J<=lenb) and (i+j<=lena) do inc(j);
expand[i]:=j;
k:=i;
end;
end;
严重注意：这里的<和<=是严格区分的
if L<p-i+1 then next[i]:=L; 这里我原本认为<=会更好，通过数学推导也是正确，但实际上会有一组恰好把它卡掉的数据
即11111
11
这个原因就是因为虽然next[2]=1，但它实际上a[1]和b[2]开始匹配的时候会不止1，因为后面应该还有。
这种我们只能这样，退一步以获得正确的解
所以这个代码稍微背一下吧~
4:AC自动机
嗯听上去多么高大上的名字
可惜它不会自动AC= =！
咳全称是——..
Aho-Corasick automaton//度娘大法好
就是一个模式串，多个子串来匹配它
它的好处是只要预处理好子串，那么剩下的模式串只要在这个自动机上跑一边就可以了，
如果用KMP那么每匹配一次子串都要遍历一次模式串，这对于模式串非常长，而子串又比较多的时候是相当不利的
那什么是自动机呢？
个人理解就是我这些个子串已经预处理好了成为了一个自动机，那么模板字符串只要来这跑一便结果就出来了
哦对了在这顺便区分一下AC自动机和后缀自动机的区别
AC自动机是子串处理好了，模板串来跑一遍
但是后缀自动机是把模板串处理好了，子串来跑一遍
//其实我到现在为止还没懂后缀自动机的应用。。。
那么具体怎么用自己见讲稿把，，真的懒得讲了。。。。
一讲又是老生常谈的一大堆
具体精髓在于它的fail指针~
Code:AC自动机版和Trie图版都有但AC自动机版写挫了，但奇异的是反而AC自动机快点。。
type pppp=^node;
node=record
son:array['a'..'z'] of pppp;
last,faill:pppp;
cnt:longint;
end;
var queue:array[1..7000000] of pppp;
s:array[1..1000000] of char;
s1:string;
t,lll,ans,front,finish,len,l,i,j,k,n:longint;
p,q,head:pppp;
ch:char;
procedure neew(var p:pppp);
begin
new(p);
for ch:='a' to 'z' do p^.son[ch]:=nil;
p^.cnt:=0;
p^.last:=nil;
end;
procedure insert(s:string);
begin
p:=head; j:=1;
len:=length(s);
while (j<=len) and (p^.son[s[j]]<>nil) do begin p:=p^.son[s[j]]; inc(j); end;
for l:=j to len do
begin
neew(p^.son[s[l]]);
p:=p^.son[s[l]];
end;
inc(p^.cnt);
end;
procedure build;
begin
neew(head);
head^.faill:=head;
readln(n);
for i:=1 to n do
begin
readln(s1);
insert(s1);
end;
end;
procedure GetFail;
begin
p:=head; front:=1; finish:=0;
for ch:='a' to 'z' do if p^.son[ch]<>nil then begin
inc(finish);
queue[finish]:=p^.son[ch];
p^.son[ch]^.faill:=head;
p^.son[ch]^.last:=nil;
end;
while (front<=finish) do
begin
for ch:='a' to 'z' do if queue[front]^.son[ch]<>nil then begin
inc(finish);
queue[finish]:=queue[front]^.son[ch];
p:=queue[front]^.faill;
while (p<>head) and (p^.son[ch]=nil) do p:=p^.faill;
if p^.son[ch]<>nil then p:=p^.son[ch];
queue[finish]^.faill:=p;
if queue[finish]^.faill^.cnt<>0 then queue[finish]^.last:=queue[finish]^.faill
else queue[finish]^.last:=queue[finish]^.faill^.last;
end
else queue[front]^.son[ch]:=queue[front]^.faill^.son[ch];
inc(front);
end;
end;
procedure main;
begin
build;
GetFail;
ans:=0;
lll:=0;
while not(eoln) do
begin
inc(lll);
read(s[lll]);
end;
p:=head;
for i:=1 to lll do
if p^.son[s[i]]=nil then begin p:=head; continue; end
else begin
p:=p^.son[s[i]];
q:=p;
while q^.last<>nil do
begin
q:=q^.last;
inc(ans);
end;
if p^.cnt<>0 then inc(ans);
end;
writeln(ans);
end;
begin
main;
end.

type
pppp=^node;
node=record
next:array['a'..'z'] of pppp;
faill,father:pppp;
count:longint;
end;
var queue:array[1..700000] of pppp;
i,j,k,n:longint;
head,p,q,r:pppp;
mother:array[1..500000] of char;
ans,len:longint;
s:string;
procedure neew(var p,fa:pppp);
var ch:char;
begin
new(p);
p^.count:=-1;
for ch:='a' to 'z' do p^.next[ch]:=nil;
p^.father:=fa;
end;
procedure setIO;
begin
readln(n);
end;
procedure insert(s:string);
var i,l:longint;
begin
p:=head; i:=1; l:=length(s);
while (i<=l) and (p^.next[s[i]]<>nil) do
begin
p:=p^.next[s[i]];
inc(i);
end;
while (i<=l) do
begin
neew(p^.next[s[i]],p);
p:=p^.next[s[i]];
inc(i);
end;
p^.count:=1;
end;
procedure build;
begin
neew(head,head);
for i:=1 to n do
begin
readln(s);
insert(s);
end;
end;
procedure failure;
var ch:char;
i,p,q:longint;
k:pppp;
begin
head^.faill:=head; p:=1; q:=0;
for ch:='a' to 'z' do
if head^.next[ch]<>nil then begin
inc(q);
queue[q]:=head^.next[ch];
head^.next[ch]^.faill:=head;
end;
while (p<=q) do
begin
for ch:='a' to 'z' do
if queue[p]^.next[ch]<>nil then begin
inc(q);
queue[q]:=queue[p]^.next[ch];
k:=queue[p]^.faill;
while (k<>head) and (k^.next[ch]=nil) do k:=k^.faill;
if k^.next[ch]=nil then queue[q]^.faill:=head
else queue[q]^.faill:=k^.next[ch];
end;
inc(p);
end;
end;
procedure main;
begin
setIO;
build;
failure;
while not(eoln) do
begin
inc(len);
read(mother[len]);
end;
p:=head;
for i:=1 to len do
begin
if p^.next[mother[i]]<>nil then begin r:=p; p:=p^.next[mother[i]]; end
else begin
while (p<>head) and (p^.next[mother[i]]=nil) do p:=p^.faill;
if (p^.next[mother[i]]<>nil) then begin r:=p; p:=p^.next[mother[i]]; end
else continue
end;
if p^.count<>-1 then inc(ans);
q:=p;
p:=r;
while p<>head do
begin
p:=p^.faill;
if p^.next[mother[i]]<>nil then if p^.next[mother[i]]^.count<>-1 then inc(ans);
end;
p:=q;
end;
writeln(ans);
end;
begin
main;
end.

好了AC自动机讲完了~
那么接下来复习下LCA、RMQ、和Tarjan缩点
LCA有两种
一种是Tarjan-LCA
是一种离线算法
是在树的深搜的基础上进行的~
哎呀发现不想说什么> <
具体思想见程序> <
理论上是O(N),但由于调用系统栈所以常数会有点大~
procedure TarjanLCA(u:longint);
var i:longint;
begin
father[u]:=u;
i:=headlist[u];
while i<>-1 do
begin
TarjanLCA(t[i]);
merge(t[i],u);
i:=next[i];
end;
visited[u]:=true;
for i:=1 to ask[u,0] do
if visited[ask[u,i]] then ans[u,i]:=find(ask[u,i]);
end;
当然了我们还有一种在线的算法
利用倍增思想
f[i,j]表示
即设i点向上走2^j步后到了哪里
deep[i]表示i的深度
可以做到O(nlgn)的预处理，以及O(lgn)的查询
procedure dfs(u:longint);
begin
for i:=1 to maxlen do f[u,i]:=f[f[u,i-1],i-1];
i:=headlist[u];
while i<>-1 do
begin
deep[t[i]]:=deep[u]+1;
dfs(t[i]);
i:=next[i];
end;
end;
function ask(a,b:longint):longint;
var i,dep:longint;
begin
if deep[a]>deep[b] then begin
t:=a;
a:=b;
b:=t;
end;
del:=deep[b]-deep[a];
for i:=0 to maxlen do
if (del and (1 shl i))<>0 then b:=f[b,i];
if a<>b then begin
for i:=maxlen downto 0 do
if f[b,i]<>f[a,i] then begin
a:=f[a,i];
b:=f[b,i];
end;
a:=f[a,0];
b:=f[b,0];
end;
exit(a);
end;
就是这样~
还有一个RMQ
即区间最值
这个同样利用倍增的思想
可以做到O(nlgn)的预处理以及O（1）的查询~
procedure yuchuli;
var x:longint;
begin
x:=trunc(ln(n)/ln(2))//常数优化 pow[i]表示2的i次 f[i,0]即a[i]
for i:=1 to x do
for j:=1 to n-pow[i]+1 do
f[i,j]:=max(f[i,j-1],f[i+pow[j-1],j-1]);
end;
function ask(a,b:longint):longint;
begin
m:=trunc(ln(b-a+1)/ln(2));
exit(max(f[a,m],f[b-pow[m]+1,m]));
end;
接下来是Tarjan~
由于Tarjan通常是嵌套在其他题中，所以给个求dfs序时候的模板吧~
procedure dfs(u:longint);
begin
dfn[u]:=time; inc(time);
low[u]:=dfn[u];
push[u]//把u放到栈里
visited[u]:=true;
instack[u]:=true;
i:=headlist[u];
while i<>-1 do
begin
if not(visited[t[i]]) then begin
dfs(t[i]);
low[v]:=min(low[v],low[w]);
end
else if instack[t[i]] then low[v]:=min(low[v],dfn[t[i]]);
i:=next[i];
end;
if dfn[u]=low[u] then begin
inc(scc_cnt);
repeat
v:=stack[top];
belong[v]:=scc_cnt;
pop;
instack[v]:=false;
until u=v;
end;
end;
我最后想回忆的是后缀三兄弟——即后缀数组后缀树后缀自动机~
先讲后缀数组~
什么是后缀数组呢？
就是把某个字符串的所有后缀搞起来，然后干一些很有意思的事情~
var s:array[1..200000] of char;
sum1:array['a'..'z'] of longint;
height,sum:array[1..200000] of longint;
rank,trank,sa,tsa:Array[1..400000] of longint;
p,i,j,n,k:longint;
ch:char;
procedure init;
begin
while not(eoln) do
begin
inc(n);
read(s[n]);
end;
end;
procedure sorting(j:longint);
begin
for i:=1 to n do sum[i]:=0;
for i:=1 to n do inc(sum[rank[i+j]]);
for i:=1 to n do inc(sum[i],sum[i-1]);
for i:=n downto 1 do
begin
tsa[sum[rank[i+j]]]:=i;
dec(sum[rank[i+j]]);
end;
for i:=1 to n do sum[i]:=0;
for i:=1 to n do inc(sum[rank[i]]);
for i:=1 to n do inc(sum[i],sum[i-1]);
for i:=n downto 1 do
begin
sa[sum[rank[tsa[i]]]]:=tsa[i];
dec(sum[rank[tsa[i]]]);
end;
end;
procedure main;
begin
init;
for i:=1 to n do inc(sum1[s[i]]);
for ch:='b' to 'z' do inc(Sum1[ch],sum1[chr(ord(ch)-1)]);
for i:=n downto 1 do
begin
sa[sum1[s[i]]]:=i;
dec(sum1[s[i]]);
end;
rank[sa[1]]:=1;
p:=1;
for i:=2 to n do
begin
if s[sa[i]]<>s[sa[i-1]] then inc(p);
rank[sa[i]]:=p;
end;
j:=1;
while (j<=n) do
begin
sorting(j);
p:=1;
trank[sa[1]]:=1;
for i:=2 to n do
begin
if (rank[sa[i]]<>rank[sa[i-1]]) or (rank[sa[i]+j]<>rank[sa[i-1]+j]) then inc(p);
trank[sa[i]]:=p;
end;
for i:=1 to n do rank[i]:=trank[i];
j:=j shl 1;
end;
for i:=1 to n do write(sa[i],' ');
writeln;
i:=sa[rank[1]-1];
j:=0;
while s[i+j]=s[1+j] do inc(j);
height[rank[1]]:=j;
for i:=2 to n do
begin
if j>0 then dec(j);
k:=sa[rank[i]-1];
while (s[k+j])=(s[i+j]) do inc(j);
height[rank[i]]:=j;
end;
for i:=1 to n do write(height[i],' ');
end;
begin
main;

end.

其他的后缀二兄弟没写过，不过懂了

具体可见我的讲稿~

//愿格式君存活着

wuzhuangtai00

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
字符串算法回忆~

本来都把这茬给忘了，看到别人在写日记于是突然想起来自己已经有6天的日记没写了囧于是赶紧补上觉得我百度云里的几篇讲稿很不错~都是看它们看懂的//严重推荐那篇后缀树和后缀自动机的讲稿严重吐槽WJMZBMR的PPT，魂淡我这个蒟蒻看不懂好吧最近几天学了很多的东西~包括思维深度的练习和思维广度的练习~想到自己思维广度还是差太多，于是就先把思维深度放到了一边回去有时间再搞模拟题吧，，现在至
复制链接

扫一扫