Python_ 正则化基础知识

最新推荐文章于 2024-05-06 23:59:04 发布

sinat_15355869

最新推荐文章于 2024-05-06 23:59:04 发布

阅读量734

点赞数 1

分类专栏： Regex

本文链接：https://blog.csdn.net/sinat_15355869/article/details/81950682

版权

Regex 专栏收录该内容

1 篇文章 0 订阅

订阅专栏

Github: https://github.com/yjfiejd/Regex （包括示例代码）

【转】：python正则式表达用法总结

【转】：Python正则表达式指南 (转自:AstralWind)

【转】：Python开发系列课程(14) - 玩转正则表达式

【转】：Python 正则表达式模块 (re) 简介

【转】：python | 史上最全的正则表达式

【维基百科】：https://zh.wikipedia.org/wiki/正则表达式

【官方参考链接】

1) Regular expression operations 【官方】

2) Regular Expression HOWTO 【官方】

3) Python 正则表达式模块 (re) 简介

【代码】

1，正则表达式 - 元字符

.            匹配除了换行符“\n”以外的任意一个字符
^            匹配字符串的开头
$            匹配字符串的结尾
*，+，？      ‘*’表示匹配前一个字符重复0次~无限次，‘+’表示匹配前一个字符重复1次~无限次，‘?’表示匹配前一个字符重复0次~1次
*?, +?, ??   前面都加上了'?' -> 非贪婪匹配 ，上面一行是指贪婪匹配
\            对特殊字符进行转义
|            或的意思
{n}          匹配前一个字符m次         
{n,}         匹配前一个字符最少出现n次
{n,m}        匹配前一个字符n次~m次,闭区间   -> 注意m前面不要出现空格
[...]        匹配一个特殊的字符集合,所有特殊字符均失去特殊意义，除了：^, -, ], \
[^...]       匹配除了集合中的任意一个字符
[a-z]        匹配区间内的任意一个a~z字符  其中"-"代表区间范围
(...)        被括起来的表达式作为一个分组，re.findall在有组的情况下只显示组内容
(?#...)      A comment; the contents of the parentheses are simply ignored. 
(?=...)      Matches if ... matches next, but doesn’t consume any of the string. This is called a lookahead assertion.  For example, Isaac (?=Asimov) will match 'Isaac ' only if it’s followed by 'Asimov'.
(?!=...)     Matches if ... doesn’t match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it’s not followed by 'Asimov'.
(?<=...)     Matches if the current position in the string is preceded by a match for ... that ends at the current position. This is called a positive lookbehind assertion. (?<=abc)def will find a match in 'abcdef', since the lookbehind will back up 3 characters and check if the contained pattern matches. The contained pattern must only match strings of some fixed length, meaning that abc or a|b are allowed, but a* and a{3,4} are not.

2，正则表达式 - 特殊序列

\A            只在字符串开头进行匹配。
\b            匹配位于开头或者结尾的空字符串
\B            匹配不位于开头或者结尾的空字符串
\d            匹配任意十进制数，相当于[0-9]
\D            匹配任意非数字字符，相当于[^0-9]
\s            匹配任意空白字符，相当于[\t\n\r\f\v]
\S            匹配任意非空白字符，相当于 [^ \t\n\r\f\v]
\w            匹配任意数字和字母（含下划线），相当于 [a-zA-Z0-9_]
\W            匹配任意非数字和字母的字符，相当于 [^a-zA-Z0-9_]
\z            只在字符串尾进行匹配

3，正则表达式 - re模块（部分）

1) re.compile(pattern, flags=0)                 编译正则表达式模式，返回一个对象。可以把常用的正则表达式编译成正则表达式对象，方便后续调用及提高效率
2) re.match(pattern, string[, flags=0])         总是从字符串开头匹配，并返回匹配的字符串的 match 对象 <class '_sre.SRE_Match'>。
3) re.search(pattern, string[, flags=0])        对整个字符串进行搜索匹配，返回第一个匹配的字符串的 match 对象。
4) re.sub(pattern,repl,string,max = 0)          用于替换匹配项
5) re.findall(pattern, string, flags=0)         以列表的形式返回能匹配的子串

1) match()和re.search()的区别：re.match只匹配字符串的开始，如果字符串开始不符合正则表达式，则匹配失败，函数返回None；而re.search匹配整个字符串，直到找到一个匹配。
2) 贪婪匹配与非贪婪匹配的区别：贪婪匹配会尽可能的多匹配，python默认是贪婪模式，后面加一个?就变成非贪婪模式了
3) r'\n' 表示2个字符‘\’ 和 'n'，python中用‘r’作为字符串的前缀。

【借用图一张】