正则表达式杂记

最新推荐文章于 2024-01-05 12:33:01 发布

cwtnice

最新推荐文章于 2024-01-05 12:33:01 发布

阅读量87

点赞数

分类专栏：其他文章标签：正则表达式

本文链接：https://blog.csdn.net/cwtnice/article/details/113006282

版权

其他专栏收录该内容

4 篇文章 0 订阅

订阅专栏

正则表达式：
记录文本规则的代码，是一个特殊的字符序列
由普通字符和元字符组成，实际上就是对元字符的练习

元字符：
.匹配除了换行符\n以外的任意字符
\w 匹配字母或数字或下划线或汉字
\s 匹配任意的空白符
\d 匹配数字
\b 匹配单词的开始和结束
^ 匹配字符串的开始
$ 匹配字符串的结束

import re
reg_string = 'hello9527python@cwtnice.hello@!:weitao蔚涛'
reg = '\w'

s = re.findall(reg,reg_string)
print(s)	#结果为字母数字的字符型列表

'''
反义代码： 和对应小写相反
\W
\S
[^]

限定符：
* 重复零次或多次
+ 重复一次或多次
？ 重复零次或一次
{n} 重复n次
{n，} 重复n次或更多次
{n，m} 重复n到m次
'''
reg_string = 'hello9527python@cwtnice.hello@!:weitao蔚涛'
reg = '\d{4}'
result = re.findall(reg,reg_string)
print(result)	#['9527']

#匹配范围
reg2 = '[0-9a-z]{4}'
result2 = re.findall(reg2,reg_string)
print(result2)	#['hell', 'o952', '7pyt', 'cwtn', 'hell', 'weit']

#匹配ip应用
ip = 'this is ip:192.168.1.123 : 172.138.2.15'

reg = '\d{3}.\d+.\d+.\d'
#reg = re.compile('\d{3}.\d+.\d+.\d')
result = re.findall(reg,ip)
print(result)	#['192.168.1.1', '172.138.2.1']

# search
ip = 'this is ip:192.168.1.123 : 172.138.2.15'
reg2 = '(\d{1,3}.){3}\d{1,3}'
result2 = re.search(reg2,ip)[0]
print(result2)	#192.168.1.123

#search 和findall ： search只匹配第一个，findall匹配所有

#组匹配 group(0)是整个 1是第一组

s = 'this is phone:15058528162 and this is my postcode:321000'
reg = 'this is phone:(\d{11}) and this is my postcode:(\d{6})'
result = re.search(reg,s).group(0)	#this is phone:15058528162 and this is my postcode:321000
print(result)

#match 只匹配开头的 re.I忽略大小写
reg_string = 'hellopythonhellostring'
reg = 'HeLlo'
result2 = re.match(reg,reg_string,re.I).group()
print(result2)	#hello

补充：
1.贪婪和非贪婪：
贪婪：尽可能多的匹配 python默认贪婪

非贪婪：尽可能少的匹配非贪婪操作符：?用在* + ？后面，要求正则匹配的越少越好

2.匹配*：用\*和[*]
3.[]字符集：[abc]和[a-c]一样

re模块

正则表达式使用对特殊字符进行转义，所以如果我们要使用原始字符串，只需加一个 r 前缀： r’hangzhou\t.\tpython’

re 模块一般使用步骤：

使用 compile() 函数将正则表达式的字符串形式编译为一个 Pattern 对象
通过 Pattern 对象提供的一系列方法对文本进行匹配查找，获得匹配结果，一个 Match 对象。
最后使用 Match 对象提供的属性和方法获得信息，根据需要进行其他的操作

compile 函数：
compile 函数用于编译正则表达式，生成一个 Pattern 对象，它的一般使用形式如下：

import re
it = re.compile(r'/d+' )

例子：

import re
#     北美电话的常用格式:(eg: 2703877865)
#             前3位: 第一位是区号以2~9开头 , 第2位是0~8, 第三位数字可任意;
#             中间三位数字:第一位是交换机号, 以2~9开头, 后面两位任意
#             最后四位数字: 数字不做限制;

pattern2 = r'\(?[2-9][0-8]\d\)?[-\.\s]?[2-9]\d{2}[-\.\s]?\d{4}'
text = '1535(323)4567890(54521)'
patternObj = re.compile(pattern2)
result = patternObj.findall(text)[0]
print(result)	#(323)4567890