字符串匹配

最新推荐文章于 2024-08-31 15:34:19 发布

liuchang631

最新推荐文章于 2024-08-31 15:34:19 发布

阅读量375

点赞数 1

文章标签：算法 linux内核 google 编程语言测试

本文链接：https://blog.csdn.net/liuchang631/article/details/5897239

版权

首先找到一本系统介绍字符串匹配的书

http://d.download.csdn.net/down/1951661/ld6886

有兴趣可以看下

一、标准库函数
在C/C++语言编程过程中，一般的字符串搜索操作都是通过标准库的strstr()函数来完成的，这在通常的情况下，因为字符串的搜索操作不多，并不会产生效率问题。
原函数为：char* strstr(const char* src, const char* patn)，返回参数src指向的指定字符串中首次出现参数patn指定字符串的指针。

二、BF算法
 BF算法的设计思想是将主串S的第一个字符和模式串T的第一个字符比较，若相等，则继续比较后续字符；若不相等，则从主串S的第二个字符起，重新与T的第一个字符比较，直至主串S的一个连续子串字符序列与模式串T相等，返回值为S中与T匹配的子序列第一个字符的序号，即匹配成功；否则，匹配失败，返回-1。最坏的情况下，时间复杂度为0(strlen(S)*strlen(p))。
int match(const char* S, const char* p)
{
int i=0,j=0;
while(i<strlen(S)&&j<strlen(p))
{
 if(S[i]==p[j])
 {
 i++;j++;
 }
 else
 {
 i++;
 j=0;
 }
}
if(j==strlen(p))
 return i-j;
else return -1;
}

三、KMP（Knuth-Morris-Pratt）算法
kmp算法是改进的一种快速模式匹配算法。
KMP算法通过预先计算模式字符串中相应字符处的回溯索引，避免了模式匹配时不必要的回溯操作，从而提高了效率，将时间复杂度变成了O(m+n)。至于更加详细的内容，请教Google老师是个不错的注意。

KMP算法因为要保存每个字符的回溯索引，所以空间复杂度会略微有所增加，即辅助数组next的空间大小：sizeof(idx)*length(pattern) 。

另外，当n比较小时，建立回溯索引所引入的O（m）个时间复杂度也许并不轻松。这些条件致使KMP算法适用于n和m都比较大，且字符串搜索操作比较频繁的环境。

在学习这个算法的过程中，将Linux内核中的源代码搬到了这里:

void kmp_init(const char *patn, int len, int *next)
{
int i, j;

next[0] = 0;
 for (i = 1, j = 0; i < len; i ++) {
 while (j > 0 && patn[j] != patn[i])
 j = next[j - 1];
 if (patn[j] == patn[i])
 j ++;
 next[i] = j;
 }
}

int kmp_find(const char *text, int text_len, const char *patn,
 int patn_len, int *next)
{
 int i, j;
 for (i = 0, j = 0; i < text_len; i ++ ) {
 while (j > 0 && text[i] != patn[j])
 j = next[j - 1];
 if (text[i] == patn[j])
 j ++;
 if (j == patn_len)
 return i + 1 - patn_len;
 }

return -1;
}

四、BM算法
 BM算法和KMP算法一样，也是构造一个辅助的模式函数来加速匹配的速度。和KMP的模式函数相比BM的模式函数更加的简单：
void make_next(const char p[], int next[])
{
 for(int i = 0; i < strlen(p); i++)
 next[p[i]] = i;
}
BM算法的辅助数组大小之和匹配串的字符集大小相关（一般情况下也就是ASCII字符集，256个字符）。
当然如果出现重复的字符，那么记录的就是这个字符最后出现的位置。

#include <string.h>
#include <stdio.h>
#include <stdlib.h>

/* 辅助数组，取决于字符集和，默认的采用 ASCII字符集，256个元素*/
#define LEN 256
int BMMatcher(char *s, char *p, int index, int position[])
/*
参数说明：
char *s：匹配串
char *p：模式串
int index：模式串匹配的起始位置，是匹配串的索引
int position[] 辅助数组，
*/
{
 int len = strlen(s);
 int i,j, nextindex;
 i = strlen(p)-1;//减1是因为要去掉最后的那个'/0'
 j = index+strlen(p)-1;//第一次调用 BMMatcher 时 index = 0，因为下面的 for 循环是从模式串的末尾开始比较，所以匹配串的初始比较位置应该是从开头数模式串长度个位置开始。

 for(; i>=0; i--, j--)
 {
 if(s[j] != p[i])
 break;
 }

if(i<0)
 return 0; /*匹配成功*/
 else if(position[s[j]]>0)//当出现不匹配时，查看匹配串当前位置的字符有没有出现在模式串中
 nextindex = index + i - position[s[j]]; //index 是当前的匹配串起始偏移量，i 是模式串还剩的比较字串数目， position[s[j]]是所出现的第一个不匹配的字符在匹配串中的位置。这样下次比较就从匹配串中出现 s[j] 的位置开始比较
 else nextindex = index + 1;

 if(nextindex > LEN-strlen(p))
 return -1; /*匹配失败，无法进行下一次匹配*/
 else
 return nextindex; /*匹配失败，需要下一次匹配*/
}

/*测试，匹配串和模式串都使用小写字符*/
int main()
{
 int position[LEN]={0}; /*辅助数组*/
 char *src="it is just a test, what would you do?"; /*匹配串*/
 char *patten="what would"; /*模式串*/
 int i, nextindex, index=-2, pos=0;

 for(i=0; i<strlen(patten); i++) /*构造辅助数组，关键的一步，但是很简单*/
 position[patten[i]]=i;

index = BMMatcher(src, patten, 0, position);

    while(!(index==-1 || index==0)) /*循环匹配，直到匹配成功，或者匹配失败结束*/
    {
      nextindex = index;
      index = BMMatcher(src, patten, nextindex, position);
    }

    if(index == -1)
       printf("Can not find it/n");

    if(index == 0)
       printf("Find it, the index is: %d./n", nextindex);

    return 0;
}

liuchang631

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
字符串匹配

 首先找到一本系统介绍字符串匹配的书 http://d.download.csdn.net/down/1951661/ld6886 有兴趣可以看下 一、标准库函数 在C/C++语言编程过程中，一般的字符串搜索操作都是通过标准库的strstr()函数来完成的，这在通常的情况下，因为字符串的搜索操作不多，并不会产生效率问题。 原函数为：char* strstr(const char* src, const ch
复制链接

扫一扫