正则表达式中? 问号的作用

最新推荐文章于 2024-07-15 00:24:10 发布

successdd

最新推荐文章于 2024-07-15 00:24:10 发布

阅读量7.1k

点赞数

分类专栏： regex 文章标签：正则表达式

本文链接：https://blog.csdn.net/successdd/article/details/79085502

版权

regex 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

参考 https://stackoverflow.com/questions/28646475/warning-preg-match-compilation-failed-unrecognized-character-after-or

参考 https://www.regular-expressions.info/atomic.html

上面第二个连接是文档厉害了

For a strange reason, the programmer has escaped the > (that is never needed).

Only these characters need to be escaped to obtain a literal character (outside a character class):

( ) ^ $ [ \ | . * + ?

{ # only in these cases: {n} {m,n} {m,}
# where m and n are integers
+ the pattern delimiter

Most of the time an escaped character that doesn’t need to be escaped (or that does not have a special meaning like \b \w \d …) is simply ignored by the regex engine. But it’s not the case here, because (?> is a fixed sequence to open an atomic group, and the sequence (? is not allowed except for these cases:

a non capturing group: (?:…) 一个非捕获组
an atomic group: (?>…)

参考 https://www.regular-expressions.info/atomic.html

An atomic group is a group that, when the regex engine exits from it, automatically throws away all backtracking positions remembered by any tokens inside the group 元组是一个组，当正则引擎退出的时候自动抛弃其他在可选项 demo

/a(?>(bc|b))c/ 
将会匹配 abcc 而不会匹配 abc 
如果不是元组 而是 /a(bc|b)c/ abc 和abcc 都会匹配  
元组为什么匹配不了abc呢 查看上面的 元组正则 a 匹配 a ;(?>(bc|b)) 匹配到bc 然后 abc 之后 没有任何东西匹配c 就会直接退出 
而不是像非元组 那样 会在(bc|b) 创建一个岔路 一旦bc 随后的 匹配失败
还是会返回到 这个岔路上 走第二条路

再看下面例子

\b(?>integer|insert|in)\b 和 \b(?>in|integer|insert)\b 
他们能不能 匹配 insert 

答案是 第一个 能匹配 第二个不能匹配

an inline modifier: (?i) (?-i) 内联修饰器参考链接就是忽略大小写等符号
a non capturing group with inline modifiers: (?i:…) (?-i:…) 非捕获内联修饰器
a lookaround: (?=…) (?!…) (?<=…) (?

# 下面是一个例子用来说明标识符的使用
$pattern = '/(?i)caseless(?-i)cased(?i)caseless/';
#等价
$pattern2 = '/(?i)caseless(?-icased)caseless/';


$str = "caselessCaseDcaselesS";
$c = preg_match_all('/(?i)caseless(?-i:Cased)caseless/', $str, $matches);
var_dump($c,$matches);

上面的仅仅做了解只用真正常见的还是下面的而一些参数

上面的内联方式可以使用/i /m /x 这样的方式代替

/(?i)caseless(?-i:Cased)caseless/

内联的方式

表4.常用分组语法

常用分组语法

分类	代码/语法	说明
捕获	(exp)	匹配exp,并捕获文本到自动命名的组里
(?exp)	匹配exp,并捕获文本到名称为name的组里，也可以写成(?’name’exp)
(?:exp)	匹配exp,不捕获匹配的文本，也不给此分组分配组号
零宽断言	(?=exp)	匹配exp前面的位置
(?<=exp)	匹配exp后面的位置
(?!exp)	匹配后面跟的不是exp的位置
(?	匹配前面不是exp的位置
注释(?#comment)	这种类型的分组不对正则表达式的处理产生任何影响，用于提供注释让人阅读

# 验证正则

# ?=HT 表示 后面是HT 的位置
$str = "demoHTdmzns";
preg_match_all('/.*m.*?(?=HT)/', $str, $matches);
var_dump($matches); #会抓取到 demo

# ?<=HT 表示 前面是HT的位置

$str = "demoHTdmzns";
preg_match_all('/(?<=HT).*n/', $str, $matches);
var_dump($matches); # 会匹配到 dmzn


#?!HT 表示后面不是HT的位置
$str = "demoHTdmzns";
preg_match_all('/.*?m(?!HT)/', $str, $matches);
var_dump($matches);
# 结果如下
array(1) {
  [0]=>
  array(2) {
    [0]=>
    string(3) "dem"
    [1]=>
    string(5) "oHTdm"
  }
}

如果是 

$str = "demHTdmzns";
preg_match_all('/.*?m(?!HT)/', $str, $matches);
var_dump($matches); # 会匹配 demoHTdm
#结果如下
array(1) {
  [0]=>
  array(1) {
    [0]=>
    string(7) "demHTdm"
  }
}

#?<!HT 前面不是HT 的位置

$str = "deHTdmzns";
preg_match_all('/(?<!HT)m.*/', $str, $matches);
var_dump($matches); # 会匹配 mzns

如果是
$str = "deHTdmzns";
preg_match_all('/(?<!HT)dm.*/', $str, $matches);
var_dump($matches); # 匹配为空