正则的核心工作原理

  • 讲解正则最好的: https://www.regular-expressions.info/tutorial.html 多看几遍全懂了, 当然, 要结合实践.
  • 正则表达式在线测试: https://regexr.com/
  • two kinds of regular expression engines:
    • text-directed engines
    • regex-directed engines
    • nearly all modern regex flavors are based on regex-directed engines, features such as lazy quantifiers and backreferences, can only be implemented in regex-directed engines.
  • How regex engines works
    • it always return the leftmost match 第一个满足的就返回
    • all permutations of the regex pattern will be matched against text, at every text unit
    • repeat the matched character, you need to use backreferences : ([0-9])\1+ , 识别连续相同的数字 \1指匹配的数字重复多次
    • Except for JavaScript and VBScript, all regex flavors discussed here have an option to make the dot match all characters, including line breaks.
    • anchors: ^, $, \b, \B
    • backreference : \1, \2等, To figure out the number of a particular backreference, scan the regular expression from left to right. Count the opening parentheses of all the numbered capturing groups. The first parenthesis starts backreference number one, the second number two, etc. Skip parentheses that are part of other syntax such as non-capturing groups. Most regex flavors support up to 99 capturing groups and double-digit backreferences.
    • parentheses and backreferences cannot be used inside character classes. ()和\1不能用在[]中
    • (q?)b\1 and (q)?b\1 在不同的regex flavors上匹配b时, 结果可相同, 可不同. 因为\1代表的是空还是q的差异引起的.
    • JavaScript (ECMAScript grammer, 即c++ regex中默认) does not support forward references, but does not treat them as an error. In JavaScript, forward references always find a zero-length match, just as backreferences to non-participating groups do in JavaScript.
    • unicode regular expression 部分regex flavors是支持的, 就是将one unicode character是否当成一个字符来处理的问题 https://www.regular-expressions.info/unicode.html
    • non-capturing group: ?: is used when you want to group an expression, but you do not want to save it as a matched/captured portion of the string. e.g. 匹配url时, (?:https?|ftp)😕/([/\r\n]+)(/[\r\n]*)? , https/ftp的匹配你不想保存, 则(?:)即可
    • Lookahead and lookbehind, collectively called “lookaround”, are zero-length assertions just like the start and end of line, and start and end of word anchors. The difference is that lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called “assertions”. They do not consume characters in the string, but only assert whether a match is possible or not:https://stackoverflow.com/questions/11621273/lookahead-vs-lookbehind
      • If the regex is x(?=insert_regex_here) that is a (positive) lookahead, which looks ahead, or forwards, in other words towards “bbbb”. It means “find an x that is followed by insert_regex_here”.
      • If the regex is (?<=insert_regex_here)x that is a (positive) lookbehind, which looks behind, or backwards, in other words towards “aaaa”. It means “find an x that is preceded by insert_regex_here”.
      • You can also have negative lookahead x(?!insert_regex_here) meaning “x not followed by insert_regex_here”, and negative lookbehind (?<!insert_regex_here)x, meaning “x not preceded by insert_regex_here”.
      • The lookahead itself is not a capturing group. It is not included in the count towards numbering the backreferences. If you want to store the match of the regex inside a lookahead, you have to put capturing parentheses around the regex inside the lookahead, like this: (?=(regex)). The other way around will not work, because the lookahead will already have discarded the regex match by the time the capturing group is to store its match.
      • You can use any regular expression inside the lookahead but not lookbehind.
      • most regex flavors do not allow you to use just any regex inside a lookbehind, because they cannot apply a regular expression backwards.

举个例子:

^(?:(?!ab).)+$

说明

^       # match start of line/string
(?:     # begin non-capturing group
  (?!   # begin negative lookahead
    ab  # literal text sequence ab
  )     # end negative lookahead
  .     # any single character
)       # end non-capturing group
+       # repeat previous match one or more times
$
Referencoes
  • https://www.regular-expressions.info/lookaround.html
  • https://stackoverflow.com/questions/977251/regular-expressions-and-negating-a-whole-character-group

有问题可留言!

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值