Python进阶丨正则表达式（上）

最新推荐文章于 2022-05-23 19:51:11 发布

So.ne

最新推荐文章于 2022-05-23 19:51:11 发布

阅读量315

点赞数 3

分类专栏： Python

本文链接：https://blog.csdn.net/m0_45198298/article/details/103889520

版权

Python 专栏收录该内容

74 篇文章 0 订阅

订阅专栏

正则表达式

- 描述

官方解释：正则表达式的概念是使用但字符串来描述、匹配一系列匹配某个句法规则的字符串。

通俗来说：正则表达式是通过一定的匹配规则，从一个字符串中提取出想要的数据。

在Python中要使用正则表达式，首先要导入Python内置的re模块

- re.findall

语法：re.findall(pattern, string, flag=0)

在字符串中找到正则表达式所匹配的所有子串，并返回一个列表，如果没有找到匹配的，则返回空列表

- 普通字符

import re

target = 'life is short,i learn python.'
result = re.findall('python', target)
# findall是re库的一个重要方法，第一个参数是匹配规则，第二个参数是要匹配的目标字符串
# 这行代码的意思是从target中匹配'python'，如果匹配到就返回，没有匹配到就返回空列表

result1 = re.findall('go', target)
print(result)
# 输出结果：['python']

print(result1)
# 输出结果：[]

普通字符串匹配规则意义并不大，以帮情况下很少用到

- 元字符

字符集

用[]表示，中括号内可以写任意字符

例1

# 找出字符串中中间是d或者e的单词

import re

target = 'abc acc aec agc adc aic'
result = re.findall('a[de]c', target)
# 这一行中的[de]表示这个位置上的字符是d或者是e都可以匹配出来
print(result)

输出结果

['aec', 'adc']

例2

# 找出字符串中中间是b-z之间的任意一个字符的单词

import re

target = 'abc acc aec agc adc aic'
result = re.findall('a[b-z]c', target)
# [b-z]表示这个位置的字符在b-z范围内都可以匹配出来

print(result)

输出结果

['abc', 'acc', 'aec', 'agc', 'adc', 'aic']

例3

import re

target = 'abc acc aec agc adc aic'
result = re.findall('a[^c-z]c', target)
# [^c-z]表示这个位置的字符不再c-z范围内都可以匹配出来

print(result)

输出结果

['abc']

匹配规则（例子）	释义
[ab]	表示该位置上的字符为a或者b，即匹配成功
[a-z]	表示该位置上的字符在a-z之间，即匹配成功
]^a-z]	表示该位置上的字符不在a-z之间，即匹配成功

概括字符集

匹配规则	释义	等价于
\d	表示该位置上的字符是数字，即匹配成功	[0-9]
\D	表示该位置上不是数字，即匹配成功	[^0-9]
\w	表示该位置上的字符是字母或_，即匹配成功	[A-Za-Z_]
\W	表示该位置上的字符不是字母或_，即匹配成功	[^A-Za-z_]
\s	表示该位置是不可见字符（空格、制表符\t、垂直符\v，回车符\r、换行符\n、换页符\f），即匹配成功	[\t\n\t\r\v]
\S	表示该位置上不是不可见字符，即匹配成功	[^\f\n\t\r\v]

例1

# \d例子

import re

target = '点赞数：12'
result = re.findall('\d', target)
# \d表示只要该位置上的字符是数字，就匹配成功，返回结果，一次只表示一个字符

print(result)
# 输出结果：['1', '2']

例2

# \D例子

import re

target = '点赞数：12'
result = re.findall('\D', target)

print(result)
# 输出结果：['点', '赞', '数', '：']

例3

# \w例子

import re

target = 'i love python_'
result = re.findall('\w', target)
# \w表示只要该位置上的字符是字母或者下划线，就匹配成功，返回结果，一次只表示一个字符

print(result)
# 输出结果：['i', 'l', 'o', 'v', 'e', 'p', 'y', 't', 'h', 'o', 'n', '_']

例4

# \W例子

import re

target = 'i love python_'
result  = re.findall('\W', target)

print(result)
# 输出结果：[' ', ' ']

例5

# \s例子

import re

target = 'life is short \n i love python'
result = re.findall('\s', target)
# \s表示只要该位置上的字符是不可见字符，就匹配成功，返回结果，一次只表示一个字符

print(result)
# 输出结果：[' ', ' ', ' ', '\n', ' ', ' ', ' ']

例6

# \S例子

import re

target = 'life is short \n i lver python'
result = re.findall('\S', target)

print(result)

输出结果

['l', 'i', 'f', 'e', 'i', 's', 's', 'h', 'o', 'r', 't', 'i', 'l', 'v', 'e', 'r', 'p', 'y', 't', 'h', 'o', 'n']

数量词

匹配规则（例子）	释义
{3}	表示{3}前面的一个字符出现3次
{3,8}	表示{3,8}前面的一个字符出现3-8次
?	表示？前面的一个字符出现1次或者无限多次
+	表示+前面的一个字符出现1次或者无限多次
*	表示*前面的一个字符出现0次或者无限多次

例1

# 判断单词数

import re

content = 'To be or not to be,that is a question'
result = re.findall('\w{1,30}',content)
# {1,30}表示\w出现1次到30次之间，只要一个单词的长度在1-30之间就能被匹配出来

print(result)

print(len(result))

输出结果

['To', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'a', 'question']
10

例2

# 防止单词长度超过30，对刚才代码优化

import re

content = 'To be or not to be,that is a questiong'
result = re.findall('\w+', content)
# +表示\w出现1次或者无限次数

print(result)

print(len(result))

输出结果

['To', 'be', 'or', 'not', 'to', 'be', 'that', 'is', 'a', 'question']
10

边界匹配符

匹配规则	释义
^	表示只要是以^后面的字符开头的，即匹配成功
$	表示只要是以$前面的字符结尾的，即匹配成功

例1

import re 

content = 'https://www.zhihu.com'
content1 = 'question/123456/answer/789/'
result = re.findall('^http.*', content)
# ^http表示匹配content的首部是http的内容，后面的.表示一个除换行符\n以外的所有字符
# .*经常放在一起用
result = re.findall('^http.*', content1)

print(result)
# 输出结果['https://www.zhihu.com']

print(result1)
# 输出结果[]
# 因为content1并不是以http开头

例2

import re

content = 'https://www.zhihu.com/shiyue.png'
content1 = 'https://www.zhihu.com'
result = re.findall('.*png$', content)
result1 = re.findall('.*png$', content1)

print(result)
# 输出结果：['https://www.zhihu.com/shiyue.png']

print(result1)
# 输出结果：[]

So.ne

关注

3
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Python进阶丨正则表达式（上）

正则表达式.- 描述官方解释：正则表达式的概念是使用但字符串来描述、匹配一系列匹配某个句法规则的字符串。通俗来说：正则表达式是通过一定的匹配规则，从一个字符串中提取出想要的数据。在Python中要使用正则表达式，首先要导入Python内置的re模块- 普通字符import retarget = 'life is short,i learn python.'result = re...
复制链接

扫一扫

专栏目录