正则还可以这样匹配。。。
geeksquiz 网站(https://www.geeksforgeeks.org/functions-python-gq/)提供代码题,可用于自测一门语言的掌握情况,今天做python的题有了有趣的发现——原来正则还可以这样写>>>
sentence = 'cats are fast'
regex = re.compile('(?P<animal>\w+) (?P<verb>\w+) (?P<adjective>\w+)')
matched = re.search(regex, sentence)
print(matched.groupdict())
output: {'adjective': 'fast', 'verb': 'are', 'animal': 'cats'}
Python 帮助文档中的说明如下:
The syntax for a named group is one of the Python-specific extensions: (?P…). name is, obviously, the name of the group. Named groups also behave exactly like capturing groups, and additionally associate a name with a group. Thematch object methods that deal with capturing groups all accept either integers that refer to the group by number or strings that contain the desired group’s name. Named groups are still given numbers, so you can retrieve information about a group in two ways:
>>> p = re.compile(r'(?P<word>\b\w+\b)')
>>> m = p.search( '(((( Lots of punctuation )))' )
>>> m.group('word')
'Lots'
>>> m.group(1)
'Lots'
The syntax for backreferences in an expression such as (...)\1
refers to the number of the group. There’s naturally a variant that uses the group name instead of the number. This is another Python extension:(?P=name)
indicates that the contents of the group calledname should again be matched at the current point. The regular expression for finding doubled words,(\b\w+)\s+\1
can also be written as (?P<word>\b\w+)\s+(?P=word)
:
>>> p = re.compile(r'(?P<word>\b\w+)\s+(?P=word)')
>>> p.search('Paris in the the spring').group()
'the the'
正则表达式文档的整理:https://docs.python.org/2/howto/regex.html
常用
^ Matches the beginning of a line
$ Matches the end of the line
. Matches any character
\s Matches whitespaces
\S Matches any non-whitespace character
- Repeats a character 0 or more times
*? Repeats a character 0 or more times(non-greedy)
- Repeats a character one or more times
+? Repeats a character one or more times(non-greedy)
( Indicates where string extraction is to start
) Indicates where string extraction is to end
\d Matches any decimal digit
\D Matches any non-didgit character
\w Matches any alphanumeric character
\W Matches any non-alphanumeric character