python正则表达式对照表

最新推荐文章于 2022-03-30 19:58:16 发布

趁手的就是最好的

最新推荐文章于 2022-03-30 19:58:16 发布

阅读量1.4k

点赞数 1

分类专栏： Python 文章标签： python 正则表达式

本文链接：https://blog.csdn.net/idiot_xue/article/details/72626215

版权

Python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

模式语法
常用函数
分组
常见问题-长期记录更新
- 问题1

模式语法

## 正则表达式 RE 
## re module in python 
import re
rule = r'abc' # r prefix, the rule you want to check in a given string 
print re.findall(rule,"aaaaabcaaaaaabcaa") # return ['abc', 'abc'] 

slist = "tip tep twp top"
#[]用来指定一个字符集 [abc] 表示 abc其中任意一个字符符合都可以 
rule = r"t[io]p"
print re.findall(rule,slist)# return ['tip', 'top'] 

# ^ 表示补集，例如[^io] 表示除i和o外的其他字符 
rule = r"t[^io]p"
print re.findall(rule,"tip tep twp top")# return ['tep', 'twp'] 

# ^ 也可以 匹配行首，表示要在行首才匹配，其他地方不匹配 
rule = r"^hello" 
print re.findall(rule, "hello tep twp hello") # return ['hello'] 
print re.findall(rule, "tep twp hello") # return [] 

# $ 表示匹配行尾 
rule = r"hello$"
re.findall(rule, "hello tep twp hello") # return ['hello'] 
re.findall(rule, "hello tep twp") # return [] 


# - 表示范围 
rule = r"x[0123456789]x" # the same as 
rule = r"x[0-9]x" 
print re.findall(rule,"x1x x4x xxx")
rule = r"x[a-zA-Z]x"
print re.findall(rule,"x1x x4x xxx")

# \ 表示转义符 
rule = r"\^hello" 
re.findall(rule, "hello twp ^hello") # return ['^hello'] 
# .  匹配任意除换行符“\n”外的字符
# \d 匹配一个数字字符。等价于[0-9]。 
# \D 匹配一个非数字字符。等价于[^0-9]。 
# \n 匹配一个换行符。等价于\x0a和\cJ。 
# \r 匹配一个回车符。等价于\x0d和\cM。 
# \s 匹配任何空白字符，包括空格、制表符、换页符等等。等价于[ \f\n\r\t\v]。 
# \S 匹配任何非空白字符。等价于[^ \f\n\r\t\v]。 
# \t 匹配一个制表符。等价于\x09和\cI。 
# \w 匹配包括下划线的任何单词字符。等价于“[A-Za-z0-9_]”。 
# \W 匹配任何非单词字符。等价于“[^A-Za-z0-9_]”。

# {} 表示重复规则 
# 例如我们要查找匹配是否是 广州的号码，020-八位数据 
# 以下三种方式都可以实现 
rule = r"^020-\d\d\d\d\d\d\d\d\d\d$"
rule = r"^020-\d{8}$"# {8} 表示前面的规则重复8次 
rule = r"^020-[0-9]{8}$"
print re.findall(rule, "020-23546813") # return ['020-23546813'] 
# * 表示将其前面的字符重复0或者多次 
rule = r"ab*" 
re.findall(rule, "a") # return ['a'] 
re.findall(rule, "ab") # return ['ab'] 

# + 表示将其前面的字符重复1或者多次 
rule = r"ab+" 
re.findall(rule, "a") # return [] 
re.findall(rule, "ab") # return ['ab'] 
re.findall(rule, "abb") # return ['abb'] 

# ? 表示前面的字符可有可无 
rule = r"^020-?\d{8}$" 
re.findall(rule, "02023546813") # return ['020-23546813 
re.findall(rule, "020-23546813") # return ['020-23546813'] 
re.findall(rule, "020--23546813") # return [] 

# ? 表示非贪婪匹配 
rule = r"ab+?" 
re.findall(rule, "abbbbbbb") # return ['ab'] 

# {} 可以表示范围 
rule = r"a{1,3}" 
re.findall(rule, "a") # return ['a'] 
re.findall(rule, "aa") # return ['aa'] 
re.findall(rule, "aaa") # return ['aaa'] 
re.findall(rule, "aaaa") # return ['aaa', 'a'] 

## compile re string 
rule = r"\d{3,4}-?\d{8}" 
re.findall(rule, "020-23546813") 
# faster when you compile it 
# return a object 
p_tel = re.compile(rule) 
print p_tel.findall("020-23546813") 

# the parameter re.I 不区分大小写
name = re.compile(r'scut',re.I)
print name.findall('Scut')
print name.findall('sCut')
print name.findall('scUT')

常用函数

match：搜索字符串开头，如果匹配对，那就返回一个对象，否则返回空
search：去搜索字符串（任何位置），如果匹配对，那就返回一个对象
findall ：返回一个满足正则的列表
sub：查找并替换

详细用法见此处

分组

import re
#分组有两个作用，它用()来定义一个组，组内的规则只对组内有效。
email = r"\w{3}@\w+(\.com|\.cn|\.org)"
print re.findall(email, "abc@scut.com") # ['.com']
print re.findall(email, "abc@scut.cn")  # ['.cn']
obj = re.match(email, "abc@scut.cn")  
re.match(email, "abc@scut.org")  

email = r"(\w{3}@\w+(\.com|\.cn))"
print re.findall(email, "abc@scut.com") # [('abc@scut.com', '.com')]
print re.findall(email, "abc@scut.cn")  # [('abc@scut.cn', '.cn')]

常见问题-长期记录更新

问题1

dash字符即”-“在[ ]代表range(从…到…),如果要表示”-“字符本身,需要放在头或尾部，举例如下

re.match('^\<.+\>([\w\s-,]+)\<.+\>$', 'Carrier-A')

报错 error：Got bad character range in regex when using comma after dash but not reverse

正确写法：

re.match('^\<.+\>([\w\s,-]+)\<.+\>$', 'Carrier-A')

趁手的就是最好的

关注

1
点赞
踩
3

收藏

觉得还不错? 一键收藏
0
评论
python正则表达式对照表

模式语法常用函数分组常见问题-长期记录更新
复制链接

扫一扫

专栏目录