PCRE - Perl-compatible regular expressions

lighttpd的源码中,keyvalue这个数据结构用到了pcre库。之前没见过,

google之后发现是一个正则表达式的库,顺便简单了解了一下使用方法。


关于PCRE库的介绍和使用方法可以看手册PCRE(3),PCRE的源码里

附带了一个pcredemo.c,里面注释很详细了,看完应该就会用了。


其实最简单的使用只需要知道两个函数(man pcreapi):

       pcre *pcre_compile(const char *pattern, int options,
            const char **errptr, int *erroffset,
            const unsigned char *tableptr);

       int pcre_exec(const pcre *code, const pcre_extra *extra,
            const char *subject, int length, int startoffset,
            int options, int *ovector, int ovecsize);


另外,lighttpd的源码中还用到了下面两个函数:

       pcre_extra *pcre_study(const pcre *code, int options,
            const char **errptr);

       void (*pcre_free)(void *);

其中pcre_free()是在PCRE 8.20之前用来释放pcre_study()返回的pcre_extra的,

在PCRE 8.20 Release中增加了下面这个函数,建议新版本的程序使用:

void pcre_free_study(pcre_extra *extra);

关于上述函数中各个参数以及返回值的说明请参考手册,简单使用的话,
大部分参数都是用缺省值就可以了。
下面我用PCRE库实现了一个非常简单的pcre_ls,可以打印当前目录下匹配某个
正则表达式的所有文件:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <dirent.h>
#include <pcre.h>

/**
 * List all the files that match the 'regex' in current directory
 *
 * Compile command:
 * gcc -o pcre_ls pcre_ls.c -lpcre
 */

#define OVECCOUNT 30            /* should be a multiple of 3 */

int main(int argc, char *argv[])
{
        DIR           *dp;
        struct dirent *dirp;
        pcre          *re;
        const char    *error;
        int            erroffset;
        int            ovector[OVECCOUNT];
        int            rc;

        if (argc != 2) {
                fprintf(stderr, "usage: %s 'regex'\n", argv[0]);
                exit(EXIT_FAILURE);
        }

        /* compile the regex in the first argument */
        re = pcre_compile(argv[1], 0, &error, &erroffset, NULL);
        if (re == NULL) {
                fprintf(stderr, "PCRE compilation failed at offset %d: %s\n",
                        erroffset, error);
                exit(EXIT_FAILURE);
        }

        /* open current directory */
        if ((dp = opendir("./")) == NULL) {
                perror("opendir");
                exit(EXIT_FAILURE);
        }

        while ((dirp = readdir(dp)) != NULL) {
                rc = pcre_exec(re, NULL, dirp->d_name, (int)strlen(dirp->d_name),
                               0, 0, ovector, OVECCOUNT);
                if (rc > 0)
                        printf("%s\n", dirp->d_name);
        }

        closedir(dp);

        return 0;
}

由于对Perl的语法不熟悉,而且lighttpd的源码中对PCRE的使用也比较简单,所以这里就简单介绍到这里吧。

The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl, with just a few differences. Certain features that appeared in Python and PCRE before they appeared in Perl are also available using the Python syntax. There is also some support for certain .NET and Oniguruma syntax items, and there is an option for requesting some minor changes that give better JavaScript compatibility. The current implementation of PCRE (release 7.x) corresponds approximately with Perl 5.10, including support for UTF-8 encoded strings and Unicode general category properties. However, UTF-8 and Unicode support has to be explicitly enabled; it is not the default. The Unicode tables correspond to Unicode release 5.0.0. In addition to the Perl-compatible matching function, PCRE contains an alternative matching function that matches the same compiled patterns in a different way. In certain circumstances, the alternative function has some advantages. For a discussion of the two matching algorithms, see the pcrematching page. PCRE is written in C and released as a C library. A number of people have written wrappers and interfaces of various kinds. In particular, Google Inc. have provided a comprehensive C++ wrapper. This is now included as part of the PCRE distribution. The pcrecpp page has details of this interface. Other people's contributions can be found in the Contrib directory at the primary FTP site, which is: ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre Details of exactly which Perl regular expression features are and are not supported by PCRE are given in separate documents. See the pcrepattern and pcrecompat pages. There is a syntax summary in the pcresyntax page. Some features of PCRE can be included, excluded, or changed when the library is built. The pcre_config() function makes it possible for a client to discover which features are available. The features themselves are described in the pcrebuild page. Documentation about building PCRE for various operating systems can be found in the README file in the source distribution. The library contains a number of undocumented internal functions and data tables that are used by more than one of the exported external functions, but which are not intended for use by external callers. Their names all begin with "_pcre_", which hopefully will not provoke any name clashes. In some environments, it is possible to control which external symbols are exported when a shared library is built, and in these cases the undocumented symbols are not exported.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值