Python_正则表达式

最新推荐文章于 2024-07-22 21:38:50 发布

苦涩2020

最新推荐文章于 2024-07-22 21:38:50 发布

阅读量194

点赞数 1

分类专栏： Python 文章标签： Python 正则表达式 re模块

本文链接：https://blog.csdn.net/userpython/article/details/79871465

版权

Python 专栏收录该内容

42 篇文章 3 订阅

订阅专栏

简介

正则表达式是对字符串操作的一种逻辑公式。它作为一种字符串的匹配模式，用于查看指定字符串是否存在于被查找字符串中、替换指定字符串或者通过匹配模式查找指定字符串。

需要定义一个用于匹配的模式字符串以及一个匹配的对象：源字符串。简单的匹配：test = re.match("hello", "hello python!")

其中“hello”是模式，“hello python!”是源——你想要检测的字符串。match()函数用于查看源是否以模式开头

如果是更加复杂的匹配，可以先对模式进行编译以加快匹配速度：temp = re.compile("hello")

然后就可以直接使用编译好的模式进行匹配了：test = temp.match("hello python!")

模式

re模块

1.match()

match()函数检测模式是否匹配源字符串（只能检测模块是否是源的开头），如果匹配成功，返回一个Match对象，否则返回None。

import re#导入正则表达式模块

test = "hello python"
if re.match("hello",test):
	print("ok")
else:
	print("failed")

2.search()

search()函数可以在源字符串的任何位置检测是否有模块，如果匹配成功，返回一个Match对象，否则返回None。

import re

test = "hello, Python world"
if re.search("Python",test):
	print("ok")
else:
	print("failed")

3.findall()

可以查找模式字符串在源字符串中出现了多少次

import re

test = "hello, Python world"

temp = re.findall("h", test)

print(temp)#>>>['h', 'h']
print(len(temp))#>>>2

4.split()

依据模式字符串将源字符串切分

import re

test = "hello, Python world"

temp = re.split("o", test)

print(temp)#>>>['hell', ', Pyth', 'n w', 'rld']

5.sub()

使用模式字符串替换匹配的源字符串，和replace()函数有些类似

import re

test = "hello, Python world"

temp = re.sub("o", "O", test)

print(temp)#>>>hellO, PythOn wOrld

6.group()

输出匹配的模式

import re


test = "hello, Python world"


temp = re.match(r"(hello), (Python) (world)", test)#匹配两组以上时要按照源字符串模式进行匹配，r是为了省略转义字符


print(temp)#>>><re.Match object; span=(0, 5), match='hello'>
print(temp.groups())#>>>('hello', 'Python', 'world')
print(temp.group())#>>>hello, Python world
print(temp.group(0))#>>>hello, Python world
print(temp.group(1))#>>>hello
print(temp.group(2))#>>>Python
print(temp.group(3))#>>>world

贪婪匹配

正则表达式默认是贪婪匹配，也就是匹配尽可能多的字符。

import re

test = "101000"

temp = re.match("^(\d+)(0*)$", test)
print(temp.groups())#>>>('101000', '')

由于正则采用贪婪匹配，所以直接把后面的0全部匹配了，导致后面的分组0*只能匹配空字符串了。

要让正则采用非贪婪匹配，才能使后面的分组0*把0匹配出来；方法很简单，在你要非贪婪的正则后面加上?就可以让正则采用非贪婪匹配了。

import re

test = "101000"

# 贪婪匹配
temp = re.match("^(\d+)(0*)$", test)
print(temp.groups())#>>>('101000', '')
# 非贪婪匹配
temp = re.match("^(\d+?)(0*)$", test)
print(temp.groups())#>>>('101', '000')