Windows上面编译pcre的步骤

源码下载

本文以8.35为例,其他的版本操作方法都是类似的。8.35版本的源代码参考链接:
http://download.csdn.net/detail/u013344915/7793027

当然,也可以到官网去下其他的版本。

官方建议的步骤

源代码解压缩之后,有个README文件,其中提到“Building PCRE on non-Unix-like systems”,内容如下:

Building PCRE on non-Unix-like systems

For a non-Unix-like system, please read the comments in the file
NON-AUTOTOOLS-BUILD, though if your system supports the use of “configure” and “make” you may be able to build PCRE using autotools in the same way as for many Unix-like systems.

PCRE can also be configured using the GUI facility provided by CMake’s cmake-gui command. This creates Makefiles, solution files, etc. The file NON-AUTOTOOLS-BUILD has information about CMake.

PCRE has been compiled on many different operating systems. It should be straightforward to build PCRE on any system that has a Standard C compiler and library, because it uses only Standard C functions.

意思就是说Windows上面的编译参考“NON-AUTOTOOLS-BUILD”这个文档。

通常,简单给出可以完成任务的一个步骤即可。但为了从心理上更能够接受,我们还是把原文拷贝过来,然后加上简单解读。可能看起来会比较啰嗦,但后面会梳理出一个简介的步骤。

建议在操作的时候,重新创建一个project,然后按照下面的步骤操作,并把文件拷贝&添加到project中。至于project的类型,比如是可执行文件、还是动态链接库,本文后面会详细给出示例。

GENERIC INSTRUCTIONS FOR THE PCRE C LIBRARY

The following are generic instructions for building the PCRE C library “by hand”. If you are going to use CMake, this section does not apply to you; you can skip ahead to the CMake section.

(1) Copy or rename the file config.h.generic as config.h, and edit the macro settings that it contains to whatever is appropriate for your environment.

In particular, you can alter the definition of the NEWLINE macro to specify what character(s) you want to be interpreted as line terminators.

In an EBCDIC environment, you MUST change NEWLINE, because its default value is 10, an ASCII LF. The usual EBCDIC newline character is 21 (0x15, NL), though in some cases it may be 37 (0x25).

When you compile any of the PCRE modules, you must specify -DHAVE_CONFIG_H to your compiler so that config.h is included in the sources.

An alternative approach is not to edit config.h, but to use -D on the compiler command line to make any changes that you need to the configuration options. In this case -DHAVE_CONFIG_H must not be set.

NOTE: There have been occasions when the way in which certain parameters in config.h are used has changed between releases. (In the configure/make world, this is handled automatically.) When upgrading to a new release, you are strongly advised to review config.h.generic before re-using what you had previously.

第一步:把config.h.generic重命名成config.h,并记得在project中定义一个宏HAVE_CONFIG_H

(2) Copy or rename the file pcre.h.generic as pcre.h.

第二步:把pcre.h.generic重命名成pcre.h。

(3) EITHER:
Copy or rename file pcre_chartables.c.dist as pcre_chartables.c.

第三步:重命名pcre_chartables.c.dist为pcre_chartables.c。

OR:
(OR的这一段直接省略了,因为OR前面的行得通,就偷懒了。)

(4) Ensure that you have the following header files:

第四步:拷贝下面两个文件到project中。

   pcre_internal.h
   ucp.h

(5) For an 8-bit library, compile the following source files, setting
-DHAVE_CONFIG_H as a compiler option if you have set up config.h with your configuration, or else use other -D settings to change the configuration as required.

第五步:增加一个宏定义 -DHAVE_CONFIG_H。对于只操作char类型的字符串,只需要把下面这些文件拷贝到project中。对于后面提到的16bit字符串,同样需要这些.c,只是还要加上pcre16_xxxx.c文件,这后面会再提到。

   pcre_byte_order.c
   pcre_chartables.c
   pcre_compile.c
   pcre_config.c
   pcre_dfa_exec.c
   pcre_exec.c
   pcre_fullinfo.c
   pcre_get.c
   pcre_globals.c
   pcre_jit_compile.c
   pcre_maketables.c
   pcre_newline.c
   pcre_ord2utf8.c
   pcre_refcount.c
   pcre_string_utils.c
   pcre_study.c
   pcre_tables.c
   pcre_ucd.c
   pcre_valid_utf8.c
   pcre_version.c
   pcre_xclass.c

(此处省略若干)

(6) Now link all the compiled code into an object library in whichever form your system keeps such libraries. This is the basic PCRE C 8-bit library. If your system has static and shared libraries, you may have to do this once for each type.

第六步:如果只对char类型的字符串操作,只需要上面这些.h和.c,就可以编译&链接了。

(7) If you want to build a 16-bit library (as well as, or instead of the 8-bit or 32-bit libraries) repeat steps 5-6 with the following files:

第七步:如果要分析的是16比特的字符串,比如wchar(占2个字节即16比特),就还需要下面这些.c文件。把他们也添加到project中。

   pcre16_byte_order.c
   pcre16_chartables.c
   pcre16_compile.c
   pcre16_config.c
   pcre16_dfa_exec.c
   pcre16_exec.c
   pcre16_fullinfo.c
   pcre16_get.c
   pcre16_globals.c
   pcre16_jit_compile.c
   pcre16_maketables.c
   pcre16_newline.c
   pcre16_ord2utf16.c
   pcre16_refcount.c
   pcre16_string_utils.c
   pcre16_study.c
   pcre16_tables.c
   pcre16_ucd.c
   pcre16_utf16_utils.c
   pcre16_valid_utf16.c
   pcre16_version.c
   pcre16_xclass.c

(8) If you want to build a 32-bit library (as well as, or instead of the 8-bit or 16-bit libraries) repeat steps 5-6 with the following files:

第八步:32比特的字符串,类似于第七步。如果用不上,就不需要加到project中。

   pcre32_byte_order.c
   pcre32_chartables.c
   pcre32_compile.c
   pcre32_config.c
   pcre32_dfa_exec.c
   pcre32_exec.c
   pcre32_fullinfo.c
   pcre32_get.c
   pcre32_globals.c
   pcre32_jit_compile.c
   pcre32_maketables.c
   pcre32_newline.c
   pcre32_ord2utf32.c
   pcre32_refcount.c
   pcre32_string_utils.c
   pcre32_study.c
   pcre32_tables.c
   pcre32_ucd.c
   pcre32_utf32_utils.c
   pcre32_valid_utf32.c
   pcre32_version.c
   pcre32_xclass.c

(9) … 后续步骤省略。

实际project验证

接下来我们就用一系列的project进行验证,包括:

  • a) 把pcre的源代码作为一个project的一部分,不作为library来对待。这涉及到简单修改pcre的源代码。——VC缺省的控制台应用程序,无预编译头
  • 同a), 但加上预编译头,讨论如何处理。
  • 把pcre编译成lib,即pcre缺省推荐的做法。讨论pcre lib和用pcre库的exe project的设置。
  • unicode下面的使用方法。

下面的代码都以vc6环境为例。新的vs环境可以打开vc6的dsw,并会自动升级。

测试用代码:
C语言的一个正则表达式pcre中的第一个代码。

下面的各个project会打包成一个zip,文章最后提供了下载地址。

TheTest1: 无lib,直接exe,无预编译头

在这一步,不把pcre编译成lib库文件,直接当成普通的源代码来生成exe。通过这种简单的方式,可以把pcre的功能用起来,演示正则匹配的效果。

首先,vc下面创建一个空的控制台应用程序。即向导中,工程类型选择“Win32 Console Application”,“一个空工程”。

自然这个时候,新建的project中是没有任何文件的。先新建一个test.c,然后把前面提到的测试代码(即C语言的一个正则表达式pcre中的第一个代码,后面不再重复)拷贝过来。

接下来就把pcre的.h,.c文件都拷贝并添加到这个工程中。然后编译,可能会有如下几个提示:

Compiling...
test.c
d:\...\thetest1\test.c(10) : fatal error C1083: Cannot open include file: 'pcre.h': No such file or directory

这个时候,只需要把test.c中对pcre.h的引用由尖括号改成双引号。即改成:

#include "pcre.h"

再一种错误:

pcre_byte_order.c
d:\...\thetest1\pcre_internal.h(467) : fatal error C1189: #error :  LINK_SIZE must be either 2, 3, or 4

如此就按照前面说的方法,增加一个HAVE_CONFIG_H宏。即在Project的Setting对话框中,C/C++页面,预处理程序定义一栏加上即可。——同时在Debug和Release中增加这个宏。

然后编译链接成功,执行:

Match succeeded at offset 0
 0: 2014-08-20
 1: 2014
 2: 08
 3: 20

Press any key to continue

缺省情况下,现在是Debug模式下编译&链接&执行。如果切换到Release下面,链接会出现下面的告警。

Linking...
   Creating library Release/TheTest1.lib and object Release/TheTest1.exp
LINK : warning LNK4049: locally defined symbol "_pcre_free" imported
LINK : warning LNK4049: locally defined symbol "_pcre_exec" imported
LINK : warning LNK4049: locally defined symbol "_pcre_compile" imported

TheTest1.exe - 0 error(s), 0 warning(s)

如果看pcre.h的代码,自然是好理解的。

/* When an application links to a PCRE DLL in Windows, the symbols that are
imported have to be identified as such. When building PCRE, the appropriate
export setting is defined in pcre_internal.h, which includes this file. So we
don't change existing definitions of PCRE_EXP_DECL and PCRECPP_EXP_DECL. */

#if defined(_WIN32) && !defined(PCRE_STATIC)
#  ifndef PCRE_EXP_DECL
#    define PCRE_EXP_DECL  extern __declspec(dllimport)
#  endif
#  ifdef __cplusplus
#    ifndef PCRECPP_EXP_DECL
#      define PCRECPP_EXP_DECL  extern __declspec(dllimport)
#    endif
#    ifndef PCRECPP_EXP_DEFN
#      define PCRECPP_EXP_DEFN  __declspec(dllimport)
#    endif
#  endif
#endif

为了消除Release下面的告警,只需要再增加一个宏定义PCRE_STATIC。结合pcre_internal.h实际上这个时候,实际上dllexport了。为此,可以直接把这两个头文件中的dllimport&dllexport这些都注释掉,然后#define PCRE_EXP_DECL为空,即:

#define PCRE_EXP_DECL

无论如何,至少现在让pcre跑起来了,可以在test.c中修改一些代码,演练正则表达式的各种功能。

TheTest1_16: pcre16

上面的例子处理的是char*字符串,现在改成wchar_t类型。

wchar_t插曲

stddef.h头文件

wchar_t是C的标志类型,用以支持宽字符。在ANSI C99标准中,说明了wchar_t定义在下面的头文件中:

#include <stddef.h>

但在下面的例子中可以看到,不include这个头文件也没有问题。其原因在于,VC的string.h和stdio.h中都有下面一句话:

#ifndef _MAC
#ifndef _WCHAR_T_DEFINED
typedef unsigned short wchar_t;
#define _WCHAR_T_DEFINED
#endif

所以,即便不include进stddef.h,只要include了stdio.h或string.h,都是可以正常使用wchar_t这个数据类型。

可以做个试验,写个小程序,没有任何的include语句,直接使用wchar_t,比如:

int main(int argc, char **argv) {
    wchar_t a;

    return 0;
}

那么就有error了:

error C2065: 'wchar_t' : undeclared identifier
error C2146: syntax error : missing ';' before identifier 'a'
error C2065: 'a' : undeclared identifier

另外不用stddef.h,直接用wchar.h也是可以的。关于这两个头文件的区别,可以参考ANSI C。其中C99标准中有一节讲这一块:

7.24 Extended multibyte and wide character utilities <wchar.h>

回到正题

方法&步骤:

  • vc中关闭前面的TheTest1工程,然后在资源管理器中copy&paste,把这个拷贝的TheTest1 (2)重命名成TheTest1_16。
  • 把代码中的char都改成wchar_t,——除了const char *error; 为此只需要去pcre.h中看看pcre_xxx几个函数的声明即可。
  • 现在可以尝试去编译,看看有什么错误。
  • 为了简化,我们继续描述步骤。把pcre类型改成pcre16;pcre_compile改成pcre16_compile;把pcre_exec改成pcre16_exec。
  • 把literal字符串前面都加个L,说明声明的是wchar_t字符串。
  • sprintf都改成wsprintf。
  • strlen改成wcslen

如果有些修改不完整,会有各种各样的编译错误或告警,或运行结果不符合预期,比如下面这种:

Match succeeded at offset 0
 0: 2
 1: 2
 2: 0
 3: 2

Press any key to continue

修改后的代码如下:

/*
 gcc -Wall test-pcre.c -lpcre -o testpcre
 or
 gcc -Wall test-pcre.c -I/usr/local/include -L/usr/local/lib -lpcre -o testpcre

 */

#include <stdio.h>
#include <string.h>
#include "pcre.h"

//#include <wchar.h>

#define OVECCOUNT 30    /* should be a multiple of 3 */

int main(int argc, char **argv) {
    pcre16 *re;
    const char *error;
    wchar_t *pattern;
    wchar_t *date;
    int erroffset;
    int ovector[OVECCOUNT];
    int subject_length;
    int rc, i;

    pattern = L"(\\d+)-(\\d+)-(\\d+)";
    date = L"2014-08-20";
    subject_length = (int) wcslen(date);

    /*************************************************************************
     * Now we are going to compile the regular expression pattern, and handle *
     * and errors that are detected.                                          *
     *************************************************************************/

    re = pcre16_compile(pattern, /* the pattern */
                0, /* default options */
                &error, /* for error message */
                &erroffset, /* for error offset */
                NULL); /* use default character tables */

    /* Compilation failed: print the error message and exit */

    if (re == NULL) {
        wprintf(L"PCRE compilation failed at offset %d: %s\n", erroffset, error);
        return 1;
    }

    /*************************************************************************
     * If the compilation succeeded, we call PCRE again, in order to do a     *
     * pattern match against the subject string. This does just ONE match. If *
     * further matching is needed, it will be done below.                     *
     *************************************************************************/

    rc = pcre16_exec(re, /* the compiled pattern */
                NULL, /* no extra data - we didn't study the pattern */
                date, /* the subject string */
                subject_length, /* the length of the subject */
                0, /* start at offset 0 in the subject */
                0, /* default options */
                ovector, /* output vector for substring information */
                OVECCOUNT); /* number of elements in the output vector */

    /* Matching failed: handle error cases */

    if (rc < 0) {
        switch (rc) {
        case PCRE_ERROR_NOMATCH:
            wprintf(L"No match\n");
            break;
            /*
             Handle other special cases if you like
             */
        default:
            wprintf(L"Matching error %d\n", rc);
            break;
        }
        pcre_free(re); /* Release memory used for the compiled pattern */
        return 1;
    }

    /* Match succeded */

    wprintf(L"\nMatch succeeded at offset %d\n", ovector[0]);

    /*************************************************************************
     * We have found the first match within the subject string. If the output *
     * vector wasn't big enough, say so. Then output any substrings that were *
     * captured.                                                              *
     *************************************************************************/

    /* The output vector wasn't big enough */

    if (rc == 0) {
        rc = OVECCOUNT / 3;
        wprintf(L"ovector only has room for %d captured substrings\n", rc - 1);
    }

    /* Show substrings stored in the output vector by number. Obviously, in a real
     application you might want to do things other than print them. */

    for (i = 0; i < rc; i++) {
        wchar_t *substring_start = date + ovector[2 * i];
        int substring_length = ovector[2 * i + 1] - ovector[2 * i];
        wprintf(L"%2d: %.*s\n", i, substring_length, substring_start);
    }

    wprintf(L"\n");
    pcre_free(re); /* Release memory used for the compiled pattern */
    return 0;
}

对于pcre16_xxx & pcre32_xxx就介绍这么多,更多的功能还没有用过。。。

TheTest2:控制台程序,有预编译头

创建控制台程序:”Win32 Console Application”, “一个”Hello, World!”的程序。

然后仍然是把前面test.c的代码(非wchar_t的那个)拷贝过来,覆盖TheTest2.cpp的代码。

加pcre的文件到工程、增加宏定义,和前面的方法一样。然后编译,有下面的错误:

TheTest2.cpp
d:\...\thetest2\thetest2.cpp(109) : fatal error C1010: unexpected end of file while looking for precompiled header directive
Generating Code...
Compiling...
pcre_byte_order.c
d:\...\thetest2\pcre_byte_order.c(320) : fatal error C1010: unexpected end of file while looking for precompiled header directive

在所有的.c&.cpp的前面加上下面这句:

#include "stdafx.h"

然后编译:

Compiling...
TheTest2.cpp
Generating Code...
Compiling...
pcre_byte_order.c
d:\...\thetest2\pcre_byte_order.c(44) : fatal error C1853: 'Debug/TheTest2.pch' is not a precompiled header file created with this compiler

不过可以注意到,TheTest2.cpp正常,只有pcre_xxx.c异常。尝试把.c改成.cpp(从工程中移除掉,在资源管理器中重命名,再加回来),错误变成了:

Generating Code...
Compiling...
pcre_byte_order.cpp
d:\...\thetest2\pcre_internal.h(2340) : error C2371: 'real_pcre' : redefinition; different basic types
        e:\software\pcre-8.35\pcre-8.35-vc6\thetest2\pcre.h(324) : see declaration of 'real_pcre'

这个属于前向声明的问题。现在先把所有的.c都改成.cpp。

。。。在C++中,比较麻烦,所以我们暂停这条路。

TheTest3: c lib & cpp exe

创建一个workspace,其中有两个project,一个是exe,另外一个是pcre lib。

首先创建workspace,也是缺省的exe project:Win32 Console Application, “一个”Hello, World!”的程序。然后仍然是把前面test.c的代码拷贝过来,覆盖TheTest2.cpp的代码。

然后在这个workspace中再新建一个”Win32 Static Library”:
new_project_pcre

接下来一步不要勾选预编译头和MFC支持。——如果pcre lib要被MFC应用程序调用的话,就要勾选MFC。不过现在的例子不涉及MFC,所以就不勾选了。

在pcre project中,把pcre的源代码都拷贝过来,并定义三个宏:DLL_EXPORT, PCRE_STATIC, HAVE_CONFIG_H。在exe的project中,定义宏:PCRE_STATIC。设置exe project为Active。

在exe的.cpp中设置头文件和lib库:

#include "pcre\pcre.h"

#ifdef _DEBUG
#pragma comment(lib, "pcre\\Debug\\pcre.lib")
#else
#pragma comment(lib, "pcre\\Release\\pcre.lib")
#endif

然后用Debug&Release验证,OK。

TheTest4: 创建pcre lib

前面在一个workspace中创建2个project,其中一个是pcre lib。当然,也可以单独创建一个workspace,只包括一个pcre lib。

因为步骤是TheTest3的一个子集,所以就不重复了。代码对应附件中的pcre workspace。——这里勾选了 MFC支持。

测试代码是TheTest4,是一个简单的MFC对话框程序。为了方便调试,对printf进行了改造。部分代码如下:

TheTest4Dlg.h:

private:
    int TestPcre();
    void MyPrint(const char* format, ...);

private:
    char m_buffer[4096];

TheTest4Dlg.cpp:

#include "..\\pcre\\pcre.h"

#ifdef _DEBUG
#pragma comment(lib, "..\\pcre\\Debug\\pcre.lib")
#else
#pragma comment(lib, "..\\pcre\\Release\\pcre.lib")
#endif

#define OVECCOUNT 30    /* should be a multiple of 3 */


int CTheTest4Dlg::TestPcre()
{
    pcre *re;
    const char *error;
    char *pattern;
    char *date;
    int erroffset;
    int ovector[OVECCOUNT];
    int subject_length;
    int rc, i;

    pattern = "(\\d+)-(\\d+)-(\\d+)";
    date = "2014-08-20";
    subject_length = (int) strlen(date);

    /*************************************************************************
     * Now we are going to compile the regular expression pattern, and handle *
     * and errors that are detected.                                          *
     *************************************************************************/

    re = pcre_compile(pattern, /* the pattern */
                0, /* default options */
                &error, /* for error message */
                &erroffset, /* for error offset */
                NULL); /* use default character tables */

    /* Compilation failed: print the error message and exit */

    if (re == NULL) {
        MyPrint("PCRE compilation failed at offset %d: %s\n", erroffset, error);
        return 1;
    }

    /*************************************************************************
     * If the compilation succeeded, we call PCRE again, in order to do a     *
     * pattern match against the subject string. This does just ONE match. If *
     * further matching is needed, it will be done below.                     *
     *************************************************************************/

    rc = pcre_exec(re, /* the compiled pattern */
                NULL, /* no extra data - we didn't study the pattern */
                date, /* the subject string */
                subject_length, /* the length of the subject */
                0, /* start at offset 0 in the subject */
                0, /* default options */
                ovector, /* output vector for substring information */
                OVECCOUNT); /* number of elements in the output vector */

    /* Matching failed: handle error cases */

    if (rc < 0) {
        switch (rc) {
        case PCRE_ERROR_NOMATCH:
            MyPrint("No match\n");
            break;
            /*
             Handle other special cases if you like
             */
        default:
            MyPrint("Matching error %d\n", rc);
            break;
        }
        pcre_free(re); /* Release memory used for the compiled pattern */
        return 1;
    }

    /* Match succeded */

    MyPrint("\nMatch succeeded at offset %d\n", ovector[0]);

    /*************************************************************************
     * We have found the first match within the subject string. If the output *
     * vector wasn't big enough, say so. Then output any substrings that were *
     * captured.                                                              *
     *************************************************************************/

    /* The output vector wasn't big enough */

    if (rc == 0) {
        rc = OVECCOUNT / 3;
        MyPrint("ovector only has room for %d captured substrings\n", rc - 1);
    }

    /* Show substrings stored in the output vector by number. Obviously, in a real
     application you might want to do things other than print them. */

    for (i = 0; i < rc; i++) {
        char *substring_start = date + ovector[2 * i];
        int substring_length = ovector[2 * i + 1] - ovector[2 * i];
        MyPrint("%2d: %.*s\n", i, substring_length, substring_start);
    }

    MyPrint("\n");
    pcre_free(re); /* Release memory used for the compiled pattern */
    return 0;
}

void CTheTest4Dlg::OnOK() 
{
    m_buffer[0] = 0;
    TestPcre();
    MessageBox(m_buffer, _T("Test Pcre"), MB_OK);

    CDialog::OnOK();
}


void CTheTest4Dlg::MyPrint(const char* fmt, ...)
{
    va_list ap;
    va_start(ap, fmt);

    vsprintf(m_buffer + strlen(m_buffer), fmt, ap);

    va_end(ap);
}

注:如果pcre库在创建的时候没有勾选MFC,那么MFC中使用的时候,会有链接错误:

libcd.lib(crt0dat.obj) : error LNK2005: _exit already defined in msvcrtd.lib(MSVCRTD.dll)
libcd.lib(crt0init.obj) : error LNK2005: ___xc_z already defined in msvcrtd.lib(cinitexe.obj)
libcd.lib(crt0dat.obj) : warning LNK4006: _exit already defined in msvcrtd.lib(MSVCRTD.dll); second definition ignored
LINK : warning LNK4098: defaultlib "msvcrtd.lib" conflicts with use of other libs; use /NODEFAULTLIB:library
libcd.lib(crt0.obj) : error LNK2001: unresolved external symbol _main

访问冲突

再者,如果debug单步调试最后退出应用程序的时候,提示访问冲突,那么可以把m_buffer由数组方式改成动态分配内存的方式。——本文最后的附件中未体现这一块的代码。

.h文件:

private:
    enum {BUFFER_SIZE = 4096};
    char* m_buffer;

.cpp文件:

BOOL CTheTest4Dlg::OnInitDialog()
{
    CDialog::OnInitDialog();

    ...

    // TODO: Add extra initialization here

    m_buffer = (char*)malloc(BUFFER_SIZE);

    return TRUE;  // return TRUE  unless you set the focus to a control
}


BOOL CTheTest4Dlg::DestroyWindow() 
{
    // TODO: Add your specialized code here and/or call the base class
    free(m_buffer);

    return CDialog::DestroyWindow();
}

早期pcre版本

对于早期的版本,pcre库不需要定义
HAVE_CONFIG_H。

示例代码下载链接

pcre在windows下面的示例代码

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值