python正则表达式

最新推荐文章于 2024-07-26 17:36:26 发布

cufe_eric

最新推荐文章于 2024-07-26 17:36:26 发布

阅读量206

点赞数

分类专栏： python正则表达式文章标签： python

本文链接：https://blog.csdn.net/cufe_eric/article/details/78906602

版权

python正则表达式专栏收录该内容

0 篇文章 0 订阅

订阅专栏

#正则表达式
#点代表任意一个字符，\代表将特殊字符转义为正常字符，+表示可以表示左边任意多的字符个数
import re

key = r"http://www.nsfbuhwe.com and https://www.auhfisna.com"#胡编乱造的网址，别在意
p1 = r"https*://"#看那个星号！ 星号表示左边字符有无均可
pattern1 = re.compile(p1)
print (pattern1.findall(key))

['http://', 'https://']

key = r"<html><body><h1>hello world<h1></body></html>"#这段是你要匹配的文本
p1 = r"(?<=<h1>).+?(?=<h1>)"#这是我们写的正则表达式规则，你现在可以不理解啥意思
pattern1 = re.compile(p1)#我们在编译这段正则表达式

matcher1 = re.search(pattern1,key)#在源文本中搜索符合正则表达式的部分

#使用search 函数（正则，原数据）
print( matcher1.group(0))#打印出来

hello world

key = r"<h1>hello world<h1>"#源文本
p1 = r"<h1>.+<h1>"#我们写的正则表达式，下面会将为什么
pattern1 = re.compile(p1)
print (pattern1.findall(key))#findall返回所有匹配的元素

['<h1>hello world<h1>']

#可以使用[]代表匹配里面中的字符的任意一个
key = r"lalala<hTml>hello</Html>heiheihei"
p1 = r"<[Hh][Tt][Mm][Ll]>.+?</[Hh][Tt][Mm][Ll]>"
pattern1 = re.compile(p1)
print(pattern1.findall(key))

['<hTml>hello</Html>']

#排除[^]表示排除中括号里面的字符串
key = r"mat cat hat pat"
p1 = r"[^p]at"#这代表除了p以外都匹配
pattern1 = re.compile(p1)
print (pattern1.findall(key))

['mat', 'cat', 'hat']


#[0-9]表示任意一个数字
#[a-z]表示任意一个字母
#\d等同于[0-9]
#\D表示匹配非数字，等同于[^0-9]
#\w等同于[a-z0-9A-Z_]：匹配大小写字母，数字和下划线
#\W等同于\w取非

#在正则表达符号+具有贪婪性，会尽可能多的匹配
key = r"chuxiuhong@hit.edu.cn"
p1 = r"@.+\."#我想匹配到@后面一直到“.”之间的，在这里是hit
pattern1 = re.compile(p1)
print (pattern1.findall(key))
"""
预想输出内容为   @hit.  而输出的是却多了内容
"""

['@hit.edu.']
'\n预想输出内容为   @hit.  而输出的是却多了内容\n'

#通过加号后面添加一个?将贪婪的加号变成懒惰的加号
key = r"chuxiuhong@hit.edu.cn"
p1 = r"@.+?\."#我想匹配到@后面一直到“.”之间的，在这里是hit
pattern1 = re.compile(p1)
print (pattern1.findall(key))

['@hit.']

#{a,b}(代表a<=匹配次数<=b)，希望输出的是saas 和saaas
key = r"saas and sas and saaas"
p1 = r"sa{1,2}s"
pattern1 = re.compile(p1)
print (pattern1.findall(key))

['saas', 'sas']

key = r"saas and sas and saaas and saaaassa and saaaas and sasa"
p1 = r"sa{,2}s"   #表示最多匹配带两个a的字符串
pattern1 = re.compile(p1)
print (pattern1.findall(key))

['saas', 'sas', 'ss', 'sas']

key = r"saas and ss and saaas and saaaassa and saaaas and sasasaas and sas and saaas"
p1 = r"sa{1,}s"    #表示至少匹配出一个a
pattern1 = re.compile(p1)
print (pattern1.findall(key))

['saas', 'saaas', 'saaaas', 'saaaas', 'sas', 'saas', 'sas', 'saaas']

#\t 匹配制表符
#\c 匹配一个控制字符
#*? 惰性匹配上一个
# ? 匹配前一个字符或子表达式0次或1次重复
#{n} 匹配前一个字符或子表达式
#{n,}? 前一个的惰性匹配
#^   匹配字符串的开头
#\A  匹配字符串开头
#$   匹配字符串结束
# [\b]  退格字符
# | 逻辑或

cufe_eric

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python正则表达式

#正则表达式#点代表任意一个字符，\代表将特殊字符转义为正常字符，+表示可以表示左边任意多的字符个数import rekey = r"http://www.nsfbuhwe.com and https://www.auhfisna.com"#胡编乱造的网址，别在意p1 = r"https*://"#看那个星号！星号表示左边字符有无均可pattern1 = re.compile(p1)pr
复制链接

扫一扫

专栏目录