自我学习之KMP算法

</pre><p>        <span style="font-size:18px;">KMP算法是一种字符串的匹配算法,实质就是在长度为N的字符串中寻找是否存在长度为M的另一个字符串。KMP是由D.E.Knuth,J.H.Morris和V.R.Pratt同时发现,因此人们称它为克努特——莫里斯——普拉特操作(简称KMP算法)。接下来具体看一下KMP算法的主要过程:  </span></p><div><span style="font-size:18px;">        <span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"> KMP</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">相比较普通匹配算法的优势在于利用了匹配字串间存在的规律,在碰到匹配不相符的字符时不再重新从字串的第一个字符开始匹配而是利用</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">next</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">数组跳转进行寻找,即利用已经得到的部分匹配信息来进行后面的匹配,下面通过举一个例子来说明:假设在</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">S=</span><span class="string"><span lang="EN-US" style="line-height: 115%; font-family: Consolas; color: blue; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">"abcabcabdabba"</span></span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">中寻找</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">T=“abcabd”</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">,利用</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">KMP</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">算法进行初次匹配搜索,当搜索到</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">S[5]</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">与</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">T[5]</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">不相同时,</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">S[0]</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">不是向前进一步变为</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">S[1],</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">而是直接比较</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">S[5]</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">和</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">T[2]</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">是否相等,如果相等,则比较两个字符串的下一个字符。为什么可以直接比较</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">S[5]</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">和</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">T[2]</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">,这就依赖于</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">T</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">数组的</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">next</span><span style="line-height: 115%; font-family: 宋体; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">数组。</span><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;">  </span></span></div><div><span lang="EN-US" style="line-height: 115%; font-family: Consolas; border: 1pt none windowtext; padding: 0cm; background-image: initial; background-attachment: initial; background-size: initial; background-origin: initial; background-clip: initial; background-position: initial; background-repeat: initial;"><span style="font-size:18px;">    </span></span><p><span style="font-size:18px;"> next数组是这样定义的,它的下标从1开始,next[i]表示s[0]到s[i-1]字串中自我覆盖的程度。以T=<span style="color:blue;">"abcabd"</span>为例:  </span></p><p><span style="font-size:18px;"> next[1]=0,代表字符串<span style="color:blue;">"a"</span>中重复字串长度为0  </span></p><p><span style="font-size:18px;"> next[2]=0,代表字符串<span style="color:blue;">"ab"</span>中重复字串长度为0  </span></p><p><span style="font-size:18px;"> next[3]=0,代表字符串<span style="color:blue;">"abc"</span>中重复字串长度为0   </span></p><p><span style="font-size:18px;"> next[4]=1, 代表字符串<span style="color:blue;">"abca"</span>中重复字串长度为1,因此s[0]=s[3]=a   </span></p><p><span style="font-size:18px;"> next[5]=2,代表字符串<span style="color:blue;">"abcab"</span>中重复字串长度为2,因此s[01]=s[34]=ab  </span></p><p><span style="font-size:18px;"> next[6]=0,代表字符串<span style="color:blue;">"abcabd"</span>中重复字串长度为0</span></p><p></p><p> 下面是计算next数组的代码:</p><p><pre name="code" class="cpp">{
     int q,k;//q:模版字符串下标;k:最大前后缀长度
      int m = P.size();//模版字符串长度
     next[1] = 0;//模版字符串的第一个字符的最大前后缀长度为0
         for (q = 1,k = 0; q < m; ++q)//for循环,从第二个字符开始,依次计算每一个字符对应的next值
      {
          while(k > 0 && P[q-1] != P[k])//递归的求出P[0]•••P[q]的最大的相同的前后缀长度k
              k = next[k-1];             
         if (P[q-1] == P[k])//如果相等,那么最大相同前后缀长度加1         
        {
             k++;
        }
        next[q] = k;
     }
     for(int i=1;i<=m;i++)
     {
	
	 cout<<next[i];
      }
   }

        有了next数组以后就KMP算法就很容易理解,下面通过分析KMP算法的代码来详细解释一下原理。

     

<span style="font-size:14px;">int kmp(const string &T,const string &P,int next[])
 {
     int n,m;
     int i,q;
     n = T.size();
     m = P.size();
     makeNext(P,next);
     for (i = 0,q = 0; i < n; ++i)
     {
         while(q > 0 && P[q] != T[i])
             q = next[q];
         if (P[q] == T[i])
         {
             q++;
         }
         if (q == m)
        {
             cout<<"Pattern occurs with shift:<<i-m+1;
         }
     }    
 }</span>

        当发现某一个字符失配时。举个例子T[0]..T[k-1]与P[i]..P[i+k-1]是一致,但是T[k]!=P[i+k],此时应该查询next[i+k]来跳转,让(i+k)=next[i+k],这通过next数组可以知道最大前后缀的关系。以T="abcabcabd",P="abcabd"为例,首先求得P数组的next数组,然后进行两个字符串的匹配流程,当发现T[5]!=P[5]时,q=next[5]=2,所以下次匹配就是从T[5]和S[2],这是因为利用next数组间接得知S[0]S[1]=T[3]T[4]。

      这是我的第一篇自主学习的博客,加油,坚持下去!




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值