正则表达式

最新推荐文章于 2021-03-27 14:01:29 发布

清风い

最新推荐文章于 2021-03-27 14:01:29 发布

阅读量312

点赞数 1

文章标签： python

本文链接：https://blog.csdn.net/weixin_52868486/article/details/110747447

版权

re模块正则表达式

作用：

	1) 过滤数据 
	2) 数据验证

一、正则表达式元字符：

1、匹配单个字符

.	[art]	[a-z]	[a-zA-Z]	[0-9]	[^0-9]

\d 	任意单个数字
	\D  
	
\w 	任意单个字符 
	\W 
	
\s 	任意单个空白字符  
	\S

2、匹配字符出现的次数

* 	任意次 
+	至少一次
?	最多一次
{3}	精确3次
	{3,}
	{3,5}
	
?的含义：
	1) 次数   ab?			
	2) ?跟在一个表示次数的字符后面, 非贪婪模式匹配
		
		贪婪模式：
			最长匹配
		非贪婪模式 
			最短匹配

3、匹配字符出现的位置

^string 
string$

\bstring\b 	匹配string是不是单词

二、匹配数据

1、search()

扫描整个字符串，搜索符合正则表达式的数据; 

如果可以匹配数据，返回Match object对象， 否则返回None

import re

test = "python regex demo example"
test_re = r"d..o"

result = re.search(test_re, test)

print(result)
print(result.group())			// group()用于输出匹配的数据 


test = "python regex demo like"
test_re = r"\d"

result = re.search(test_re, test)
if result:
	print("匹配到的数据: %s" % result.group())
else:
	print("无匹配的数据！！！")

2、match()

从字符串开头匹配数据，如果可以匹配返回Match Object, 否则返回None

import re

test = "python regex demo example"

test_re = r"d..o"

result = re.match(test_re, test)
print(result)

三、匹配对象返回数据的方式

group()
	以字符串返回匹配的数据 
	
groups()
	以元组返回匹配的数据 
	
groupdict()
	以字典返回匹配的数据

1、groups()

返回元组, 元组中保存正则表达式中分组的数据 

import re

test = '10.0.45.209 - - [09/Jan/2019:12:19:42 +0800]'

test_re = r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\S\s\S\s\[(.*?)\]"

result = re.search(test_re, test)

if result:
	data = result.groups()
	print(data)
	print("客户端IP地址：%s" % data[0])
	print("客户端访问时间：%s" % data[1])

2、groupdict()

以字典返回匹配的数据

{ "分组名称": "分组匹配的数据" }

a) 正则中存在分组
b) 为分组命名    (?P<组名>....)

import re

test = '10.0.45.209 - - [09/Jan/2019:12:19:42 +0800]'

test_re = r"(?P<客户端地址>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\s\S\s\S\s\[(?P<访问时间>.*?)\]"

result = re.search(test_re, test)

if result:
	print(result.groupdict())

四、其他的方法

1、findall()

以列表的形式返回匹配的所有数据

import re

test = "python like object, tom love jerry"

test_re = r"l..e"

result = re.findall(test_re, test)

print(result)

2、split()

以正则表达式匹配的数据作为分割符分割数据 

import re

test = "root     pts/0        2019-12-06 08:54 (192.168.32.29)"

test_re = r"\s{2,}"

result = re.split(test_re, test)
print(result)

3、sub()

字符串替换  

import re

test = """http://192.168.1.1/MySQL
http://1.1.1.1/Ansible
http://10.30.2.1/zabbix
"""

new_host = "test.linux.com"

ip_re = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"

new_url = re.sub(ip_re, new_host, test)
print(new_url)

五、compile()

将字符串的正则表达式编译成正则表达式对象

正则表达式对象.search()  match()  sub()  split()  findall()

优势：节省时间、提高效率

非编译

import re
import time

ip_number = 0

start_time = time.time()

log_file = r"E:\projectA\files\access_log"

ip_re = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"

f_obj = open(log_file, "r")

for line in f_obj:
	result = re.search(ip_re, line)
	if result:
		ip_number += 1

f_obj.close()
print(ip_number)
stop_time = time.time()
print(stop_time - start_time)

compile()编译正则对象

import re
import time

ip_number = 0

start_time = time.time()

log_file = r"E:\projectA\files\access_log"

ip_re = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"

'''
	将正则表达式编译为对象
'''
ip_re_obj = re.compile(ip_re)

f_obj = open(log_file, "r")

for line in f_obj:
	'''
		调用正则对象的search()方法 
	'''
	result = ip_re_obj.search(line)
	if result:
		ip_number += 1

f_obj.close()
print(ip_number)
stop_time = time.time()
print(stop_time - start_time)