python正则表达式

最新推荐文章于 2022-03-30 19:58:16 发布

文辰光

最新推荐文章于 2022-03-30 19:58:16 发布

阅读量167

点赞数

分类专栏： python学习文章标签： python正则表达式

本文链接：https://blog.csdn.net/weixin_40127330/article/details/103819626

版权

python学习专栏收录该内容

25 篇文章 1 订阅

订阅专栏

正则表达式是用于处理字符串的强大工具，拥有自己独特的语法（核心是语法），可以用来数据验证、查询和替换等功能

re模块

参考: https://www.runoob.com/python/python-reg-expressions.html

https://www.cnblogs.com/test123/p/10608807.html

使用 Python 的原始字符串，字符串前加一个 r 前缀

import re
key = r'Hello,My name is tiger,nice to meet you...'
k = re.search(r't(ige)r',key)
if k:
    print(k.group(0),k.group(1))
    print(k.group())
else:
    print('not search!')

k = re.match(r'H(....)',key)
if k:
    print(k.group(0),k.group(1))
    print(k.group())
else:
    print('not search!')

import re

key = r'<html><body><h1>hello world</h1></body></html>'
pattern = r'(?<=<h1>).+?(?=</h1>)'
matcher = re.search(pattern,key)
print(matcher.group(0))

#一般步骤
p1 = r'.*<h1>(.*?)</h1>.*'
pattern = re.compile(p1)  #使用complie函数将字符串编译成Pattern对象
                           #通过 Pattern 对象提供的⼀系列方法对文本进行匹配查找，获得匹配结果
groups = re.match(pattern,key) #使用 Match 对象提供的属性和方法获得信息
print(groups.group(1))

输出：
hello world
hello world

查找所有 FindAll & FindIter

import re
key = r'<user01@mail.com> <usr02@mail.com> user04@mail.com'
k = re.findall(r'(\w+@m....[a-z]{3})',key)
print(k)

k = re.finditer(r'(\w+@m....[a-z]{3})',key)
for i in k:
    print(type(i))
    print(i.group())

输出：
['user01@mail.com', 'usr02@mail.com', 'user04@mail.com']
<class '_sre.SRE_Match'>
user01@mail.com
<class '_sre.SRE_Match'>
usr02@mail.com
<class '_sre.SRE_Match'>
user04@mail.com

替换 Sub & Subn

import re
key = 'ni hao ma a b c d c'

t = re.sub(r'\s','-',key)
print(t)

t = re.sub(r'\s','-',key,4)
print(t)
t = re.subn(r'\s','-',key,4)
print(t)

输出：
ni-hao-ma-a-b-c-d-c
ni-hao-ma-a-b c d c
('ni-hao-ma-a-b c d c', 4)

正则表达式包括普通字符和特殊字符（称为“元字符”）。

*的含义：ca*t 将匹配 ct（0 个字符 a），cat（1 个字符 a），caaat（3 个字符 a）

+的含义：ca*t 将匹配cat（1 个字符 a），caaat（3 个字符 a）

? 作用就是把某种东西标志位可选的

[] 指定一个字符类用于存放你需要匹配的字符集合,[abc] 会匹配字符 a，b 或 c；[a-c] 可以实现相同的功能

^的含义：匹配字符串的开头，一般与^后面要指定匹配的字符类型和数量，如^\d{3};[^abc] 则是与除a\b\c外的任何字符匹配

$的含义：与^对应，搜索字符串结尾

.的含义：匹配任意字符，除了换行符

{}的含义：一般是用来匹配的长度

()的含义：提取匹配字符串的，表达式中有几个()就有几个相应的匹配字符串

\d的含义：数字字符匹配

\的含义：反斜杠后边跟元字符去除特殊功能，反斜杠后边跟普通字符实现特殊功能

\s的含义：空格符

(.*) 贪婪匹配，会尽可能的往后匹配

(.*?) 非贪婪匹配，会尽可能少的匹配,是否加?影响匹配结果的长度，常用该匹配规则

import re
key = r'abcaxc'

t1 = re.findall(r'ab.*?c',key)
t2 = re.findall(r'ab.*c',key)

if t1:
    print(t1)
else:
    print('no search!')
    
if t2:
    print(t2)
else:
    print('no search!')

输出：
['abc']
['abcaxc']

参考：https://www.cnblogs.com/langren1992/p/9782191.html

https://blog.csdn.net/xiujing9624/article/details/76685695

http://c.runoob.com/front-end/854

https://tool.oschina.net/uploads/apidocs/jquery/regexp.html

import re

test = r'hi, nice to meet you where are you from?'

print(re.split(r'\s+',test))
print(re.split(r'\s+',test,3))

输出：
['hi,', 'nice', 'to', 'meet', 'you', 'where', 'are', 'you', 'from?']
['hi,', 'nice', 'to', 'meet you where are you from?']