perl/pcre正则表达式元字符/转义字符/量词/匹配方式

28 篇文章 0 订阅

perl/pcre正则表达式元字符/转义字符/量词/匹配方式


Linux平台上被广泛使用的正则表达式库PCRE - Perl-compatible regular expressions,从其名字即可知道,PCRE提供的是一套与Perl中相兼容的正则表达式。元字符(Meta-character)
  • '\' : 引用下一个元字符
  • '^' : 行首
  • '.' : 除新行(newline)外的任一字符('/s'选项将使'.'匹配新行字符)
  • '$' : 行尾(或结尾处新行之前字符)
  • '|' : 可选项
  • '('与')' : 分组
  • '['与']' : 字符类。表示一类字符集合中任意一个,方括号内可使用'-'表示范围,如[0-9]; 也可使用'^表示求补集,如[^0-9]表示除0-9外的其他字符



    量词(Quantifier)
    • '*' : 0或任意次
    • '+' : 1或更多次
    • '?' : 0或1次
    • {n} : n次
    • {n,} : 至少n次
    • {n, m} : n到m次

      匹配方式
      • 贪婪(greedy)方式:在模式其余部分匹配前提下,尽可能多地匹配字符
      • 最少匹配(minimum): 尽可能少地匹配。量词后使用'?'表示使用最少匹配方式
      • 占有式:与贪婪方式相近,尽可能多地匹配字符,但绝不回退(backtrack,即使模式其余部分无法匹配,也不减少本部分的匹配数量)。在数量词之后使用'+'表示使用占有式匹配。

        转义序列
        • '\t' : 制表符(HT, TAB)
        • '\n' : 换行(LF, NL)
        • '\r' : 回车(CR)
        • '\f' : 进纸(Form Feed, FF)
        • '\a' : 报警 (Alarm, BEL)
        • '\e' : 转义(ESC)
        • "\0xx" : 八进制数值对应字符,如\033表示ESC
        • "\xhh" : 16进制数值对应字符,如\x1B表示ESC
        • "\x{hhhh}" : 16进制long型数值对应字符,如\x{263a}表示unicode SMILEY
        • "\cK" : K可以为任意字母,表示控制字符"control-K","\cK"表示如VT
        • "\N{name}" : unicode命名字符
        • "\N{U+hhhh}" : unicode字符
        • '\l' : 小写下一字符
        • '\u' : 大写下一字符
        • '\L' : 小写随后字符串直至'\E'
        • '\U' : 大写随后字符串直至'\E'
        • '\E' : 结束大小写转换
        • '\Q' : 引用随后字符(禁止转义)直至'\E'

          字符类及其他转义字符
          • '\w' : 匹任任一单词(word)字符(26个英文字母、10个数字,加下划线'_')
          • '\W' : 匹配任一非单词字母
          • '\s' : 任一空白字符(空格' ', 制表符'\t'等)
          • '\S' : 任一非空白字符
          • '\d' : 任一数字字符[0-9]
          • '\D' : 任一非数字字符
          • “\pP” : 匹配命名属性P
          • "\PP" : 匹配非P
          • '\X' : 匹配unicode扩展字符集(eXtended grapheme cluster)
          • '\C' : 匹配单个C字符(字节),即使工作在unicode模式下
          • '\n' : n为数字,后向引用指定组n
          • "\gn" : 后向引用指定组n
          • "\g{-n}" : 表示相对(当前位置之前的)第n个后用引用组n
          • "\g{name}" : 后向引用命名组(name)
          • "\k{name}" : 后向引用
          • '\K' : 使\K左侧部分,不引入到$&中
          • '\N' : 除'\n'外的任一字符
          • '\v' : 垂直空白符
          • '\V' : 非垂直空白符
          • '\h' : 水平空白符
          • '\H' : 非水平空白符
          • '\R' : 行分割符号

            POSIX字符类
            POSIX字符类表示语法:[:class:], 在pattern中则必须写为"[[:class:]]"。
            • "[[:alpha:]]" : (英文)字母
            • "[[:alnum:]]" : 字母或数字字符
            • "[[:ascii:]]" : ASCII字符集中字符
            • "[[:blank:]]" : GNU扩展,等价于空格' '或水平制表符'\t'
            • "[[:cntrl:]]" : 任一控制字符
            • "[[:digit:]]" : 任一数字字符,等价于'\d'
            • "[[:graph:]]" : 除空格外的任一可打印字符
            • "[[:lower:]]" : 任一小写字符
            • "[[:print:]]" : 任一可打印字符,包括空格
            • "[[:punct:]]" : 除单词字符(字母,'_')外的任一图形字符
            • "[[:space:]]" : 任一空白字符,等价于'\s'垂直制表符"\cK"
            • "[[:upper:]]" : 任一大写字符
            • "[[:word:]]" : Perl扩展, 等价于'\w'
            • "[[:xdigit:]]" : 任一16进制数字


              Perl中,"[[^:class:]]"表示posix指定类的补集,这种情况下也可略去[::],在类名前加'^'表示为"[^class]"。Assertion
              • '\b' : 单词边界
              • '\B' : 非单词边界
              • '\A' : 字符串首
              • '\Z' : 字符串尾或尾部换行字符之前
              • '\z' : 字符串尾
              • '\G' : 在上一个匹配处进行匹配

              • 1
                点赞
              • 3
                收藏
                觉得还不错? 一键收藏
              • 0
                评论
              The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl, with just a few differences. Certain features that appeared in Python and PCRE before they appeared in Perl are also available using the Python syntax. There is also some support for certain .NET and Oniguruma syntax items, and there is an option for requesting some minor changes that give better JavaScript compatibility. The current implementation of PCRE (release 7.x) corresponds approximately with Perl 5.10, including support for UTF-8 encoded strings and Unicode general category properties. However, UTF-8 and Unicode support has to be explicitly enabled; it is not the default. The Unicode tables correspond to Unicode release 5.0.0. In addition to the Perl-compatible matching function, PCRE contains an alternative matching function that matches the same compiled patterns in a different way. In certain circumstances, the alternative function has some advantages. For a discussion of the two matching algorithms, see the pcrematching page. PCRE is written in C and released as a C library. A number of people have written wrappers and interfaces of various kinds. In particular, Google Inc. have provided a comprehensive C++ wrapper. This is now included as part of the PCRE distribution. The pcrecpp page has details of this interface. Other people's contributions can be found in the Contrib directory at the primary FTP site, which is: ftp://ftp.csx.cam.ac.uk/pub/software/programming/pcre Details of exactly which Perl regular expression features are and are not supported by PCRE are given in separate documents. See the pcrepattern and pcrecompat pages. There is a syntax summary in the pcresyntax page. Some features of PCRE can be included, excluded, or changed when the library is built. The pcre_config() function makes it possible for a client to discover which features are available. The features themselves are described in the pcrebuild page. Documentation about building PCRE for various operating systems can be found in the README file in the source distribution. The library contains a number of undocumented internal functions and data tables that are used by more than one of the exported external functions, but which are not intended for use by external callers. Their names all begin with "_pcre_", which hopefully will not provoke any name clashes. In some environments, it is possible to control which external symbols are exported when a shared library is built, and in these cases the undocumented symbols are not exported.

              “相关推荐”对你有帮助么?

              • 非常没帮助
              • 没帮助
              • 一般
              • 有帮助
              • 非常有帮助
              提交
              评论
              添加红包

              请填写红包祝福语或标题

              红包个数最小为10个

              红包金额最低5元

              当前余额3.43前往充值 >
              需支付:10.00
              成就一亿技术人!
              领取后你会自动成为博主和红包主的粉丝 规则
              hope_wisdom
              发出的红包
              实付
              使用余额支付
              点击重新获取
              扫码支付
              钱包余额 0

              抵扣说明:

              1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
              2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

              余额充值