学习pcre之摘录

最新推荐文章于 2024-05-31 20:11:46 发布

zhangge3663

最新推荐文章于 2024-05-31 20:11:46 发布

阅读量205

点赞数

在linux的C标准库包含了一个正则库，只需要引用即可引用，但是发现Linux自带的正则库无法使用元字符和非贪婪匹配，例如:

str:   1.1.1.1
regex: (\d*.\d*.\d*.\d*)

其中的正则表达式使用了元字符\d来匹配数字，但在regex.h的正则库中却无法匹配。

str:    \123\456\
regex:\(.+?)\

其中的正则表达式使用了非贪婪匹配，但在regex.h的正则库中却只匹配到了"123\456".

下面是一段测试pcre正则库的例子:

/*************************************************************************
	> File Name: example3.c
	> Author:ge.zhang 
	> Mail: 
	> Created Time: 2018年03月29日 星期四 12时07分18秒
 ************************************************************************/

#include<stdio.h>
#include <pcre.h>
#include <string.h>

#define OVECCOUNT 30 /* should be a multiple of 3 */
#define EBUFLEN 128
#define BUFLEN 1024

int main()
{
    pcre *re;
    const char *error;
    int erroffset;
    int ovector[OVECCOUNT];
    int rc, i, j;

    char src[] = "123.123.123.123:80|1.1.1.1:88";
    char pattern[] = "(\\d*.\\d*.\\d*.\\d*):(\\d*)";

    printf("String : %s\n", src);
    printf("Pattern :\"%s\"\n", pattern);

    re = pcre_compile(pattern, 0, &error, &erroffset, NULL);
    if (re == NULL) {
        printf("PCRE compilation failed at offset %d:%s\n", erroffset, error);
        return 1;
    }

    char *p = src;
    while ((rc = pcre_exec(re, NULL, p, strlen(p), 0, 0, ovector, OVECCOUNT)) != PCRE_ERROR_NOMATCH) {
        printf("ovector is {");
        for (j=0; j<OVECCOUNT; j++) {
            printf(" %d, ", ovector[j]);
        }
        printf("}\n");
        printf("\nOK, has matched...\n\n");
        for (i=0;i<rc;i++) {
            char *substring_start = p + ovector[2*i];
            int substring_length = ovector[2*i+1]-ovector[2*i];
            char matched[1024];
            memset(matched, 0, 1024);
            strncpy(matched, substring_start, substring_length);
            printf("match:%s\n", matched);
        }
        printf("iamhere p is %s\n", p);
        p += ovector[1];
        printf("iamhere p+= is %s\n", p);
        if (!p) break;
    }

    pcre_free(re);
    return 0;
}

上述代码的打印结果截图如下:

上述代码主要用到了pcre_compile和pcre_exec两个函数，其原型如下:

(1) pcre_compile

pcre *pcre_compile(const char *pattern, int options, const char **errptr, int *erroffset, const unsigned char *tableptr);
功能:编译指定的正则表达式
参数:
  pattern,输入参数，将要被编译的字符串形式的正则表达式
  options,输入参数，用来指定编译时的一些选项
  errptr,输入参数，用来输出错误信息
  erroffset,输出参数，pattern中出错位置的偏移量
  tableptr,输入参数，用来指定字符表，一般情况用NULL,使用缺省的字符表
返回值:被编译好的正则表达式的pcre内部表示结构

(2) pcre_exec

int pcre_exec(const pcre *code, const pcre_extra *extra, const char *subject, int length, int startoffset, int options, int *ovector, int ovecsize);
功能:用来检查某个字符串是否与指定的正则表达式匹配
参数:
  code,输入参数，用pcre_compile编译好的正则表达结构的指针
  extra,输入参数，用来向pcre_exec传一些额外的数据信息的结构的指针
  subject,输入参数，要被用来匹配的字符串
  length,输入参数，要被用来匹配的字符串的长度
  startoffset,输入参数，用来指定subject从什么位置开始被匹配的偏移量
  options,输入参数，用来指定匹配过程中的一些选项
  ovector,输出参数，用来返回匹配位置偏移量的数组
  ovecsize,输入参数，用来返回匹配位置偏移量的数组的最大大小
返回值:匹配成功返回非负数，匹配返回负数

其中ovector这个参数需要明白，如果pcre成功匹配的话，则会把匹配字符串的起止位置写入ovector中，例如以上代码中ovector的值如下:

$1 = { 0, 18, 0, 15, 16, 18, 11508, 22708, 6, 4096, 2, 1752488, 1756584, 1756584, 240, 240, 6, 4, 4, 372, 372, 372, 68, 68, 4, 4, 7, 1745352, 16, 0}

由于代码在预定义中设置最多匹配的数量为30个，所以这里列出了30个值，其实pcre_exec只匹配到了3个结果，变量rc保存的就是pcre_exec的匹配数量。那么这三个匹配结果的起止位置分别是:

0,18 = 123.123.123.123:80
0,15 = 123.123.123.123
16.18 = 80

由此可见，根据ovector中的值就可以提取匹配结果。

另外，代码中的正则表达式"(\d*.\d*.\d*.\d*):(\d*)"用到了两个小括号，由于正则表达式会将一对小括号中匹配到的值保存到匹配结果中，所以这段正则表达式匹配到了三个记过，如果目的只是匹配IP地址和端口号的话，则可以去掉小括号，即"\d*.\d*.\d*.\d*:\d*",这样就只会匹配到一个结果。

zhangge3663

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
学习pcre之摘录

在linux的C标准库包含了一个正则库，只需要引用即可引用，但是发现Linux自带的正则库无法使用元字符和非贪婪匹配，例如:str: 1.1.1.1regex: (\d*.\d*.\d*.\d*)其中的正则表达式使用了元字符\d来匹配数字，但在regex.h的正则库中却无法匹配。str: \123\456\regex:\(.+?)\其中的正则表达式使用了非贪婪匹配，但在regex.h...
复制链接

扫一扫