[Python]正则表达式(...)分组的扩展示例

最新推荐文章于 2021-10-16 09:58:57 发布

Scandinavians

最新推荐文章于 2021-10-16 09:58:57 发布

阅读量653

点赞数

分类专栏：正则表达式 Python

本文链接：https://blog.csdn.net/flyapy/article/details/38148017

版权

Python 同时被 2 个专栏收录

51 篇文章 0 订阅

订阅专栏

正则表达式

6 篇文章 0 订阅

订阅专栏

(?...)

This is an extension notation (a '?' following a'(' is not meaningful otherwise). The first character after the'?' determines what the meaning and further syntax of the construct is. Extensions usually do not create a new group;(?P<name>...) is the only exception to this rule. Following are the currently supported extensions.

(?:...)

A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the groupcannot be retrieved after performing a match or referenced later in the pattern.

import re
def groups(m):
    if m is not None:
        print("m.group() == %s"%m.group()),
    else:
        print("m.group() == None."),
    print(" ,m.groups() == %s"%str(m.groups()))
    

#(?...) 
#(?iLmsux)
#(?:...)
m = re.match("(?:[abcd])(color)","acolor")
groups(m)

>>>

m.group() == acolor ,m.groups() == ('color',)

(?P<name>...)

Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name name. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named.

#(?P<name>...)
m = re.match(r"(?P<id>\d+)\.\d+","324.322")
groups(m)

>>>

m.group() == 324.322 ,m.groups() == ('324',)

(?P=name)

A backreference to a named group; it matches whatever text was matched by the earlier group namedname.

m = re.match(r"(?P<id>\d+)\.(?P=id)","324.324")
groups(m)

m = re.match(r"(?P<id>\d+)\.\1","324.324")
groups(m)

>>>

m.group() == 324.324 ,m.groups() == ('324',)

(?#...)

A comment; the contents of the parentheses are simply ignored.

#(?#...)
m = re.match(r"(?#I am invisible)\d+\.\d+","324.324")
groups(m)

>>>

m.group() == 324.324 ,m.groups() == ('324',)

(?=...)

Matches if... matches next, but doesn’t consume any of the string. This is called a lookahead assertion. For example,Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by Asimov'.

#(?=...)
m = re.match(r"\d+\.(?=999)","324.999")
groups(m)

>>>

m.group() == 324. ,m.groups() == ()

(?!...)

Matches if ... doesn’t match next. This is a negative lookahead assertion. For example,Isaac (?!Asimov) will match 'Isaac ' only if it’s not followed by 'Asimov'.

#(?!...)
m = re.match(r"\d+\.(?!999)","324.324")
groups(m)

>>>

m.group() == 324. ,m.groups() == ()

(?<=...)

Matches if the current position in the string is preceded by a match for... that ends at the current position. This is called apositive lookbehind assertion.(?<=abc)def will find a match in bcdef, since the lookbehind will back up 3 characters and check if the contained pattern matches. The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and {3,4} are not. Note that patterns which start with positive lookbehind assertions will not match at the beginning of the string being searched; you will most likely want to use thesearch() function rather than thematch() function:

#(?<=...)
m = re.search(r"(?<=324)\.\d+","324.324")
groups(m)
m = re.search(r"(?<=324|234)\.\d+","234.324")
groups(m)

>>>

m.group() == .324 ,m.groups() == () m.group() == .324 ,m.groups() == () (?<!...) Matches if the current position in the string is not preceded by a match for.... This is called a negative lookbehind assertion. Similar to positive lookbehind assertions, the contained pattern must only match strings of some fixed length. Patterns which start with negative lookbehind assertions may match at the beginning of the string being searched.

#(?<!...)
m = re.search(r"(?<!324)\.\d+","32.324")
groups(m)

>>> m.group() == .324 ,m.groups() == () (?(id/name)yes-pattern|no-pattern) Will try to match with yes-pattern if the group with given id or name exists, and withno-pattern if it doesn’t. no-pattern is optional and can be omitted. For example, (<)?(\w+@\w+(?:\.\w+)+)(?(1)>) is a poor email matching pattern, which will match with'<user@host.com>' as well as 'user@host.com', but not with '<user@host.com'.

#(?(id/name)yes-pattern|no-pattern)
m = re.search(r"(?P<id>\d)\w+(?(id)\d)","1abc1")
groups(m)
m = re.search(r"(?P<id>\d)\w+?(?(id)(\d+))","4abc632")
groups(m)