python学习的第二十六天：正则表达式（regular expression）

最新推荐文章于 2024-08-10 06:13:02 发布

踏墟

最新推荐文章于 2024-08-10 06:13:02 发布

阅读量155

点赞数 5

分类专栏：开始Python学习第四周文章标签：正则表达式 python 字符串

本文链接：https://blog.csdn.net/m0_52863098/article/details/119697802

版权

开始Python学习同时被 2 个专栏收录

24 篇文章 2 订阅

订阅专栏

第四周

6 篇文章 0 订阅

订阅专栏

python学习的第二十六天：正则表达式（regular expression）

对正则表达式的基础理解

正则表达式 —> 模式 —> 匹配字符串的模式 —> 复杂的匹配规则
字符集 —> [] —> [a-zA-Z0-9_]{6,20} —> \w{6,20} —> ^\w{6,20}$
Python使用正则表达式的两种方式：
- 不创建正则表达式对象，直接调用函数进行匹配操作
  - match
  - fullmatch
- 创建正则表达式对象（Pattern），通过给对象发消息实现匹配操作
- compile

正则表达式的应用方法

match - 匹配 - 从头开始进行匹配 —> Match对象 —> group()
search - 搜索 - 从任意位置匹配 —> Match对象 —> group()
findall - 从字符串中找出所有和正则表达式匹配的内容 —> list[str]

例子：

检查用户名是否合法

# 网站注册，用户名要求必须是字母、数字、下划线，长度在6到20个字符之间
# 检查用户名是否合法
import re

username = input('请输入用户名: ')
# 通过compile编译正则表达式创建Pattern对象
username_pattern = re.compile(r'^\w{6,20}$')
print(type(username_pattern))
# 通过给Pattern对象发消息实现匹配检查
matcher = username_pattern.match(username)
print(type(matcher))
if matcher is None:
    print('无效的用户名！！！')
else:
    print(matcher.group())

从字符串中提取跟正则表达式匹配的部分

import re

content = """报警电话：110，我们班是Python-2105班，
我的QQ号是1234567，我的手机号是13811223344，谢谢！"""


pattern = re.compile(r'\d+')
matcher = pattern.search(content)
while matcher:
    print(matcher.group())
    print(matcher.start(), matcher.end())
    matcher = pattern.search(content, matcher.end())

results = pattern.findall(content)
for result in results:
    print(result)

results = re.findall(r'\d+', content)
for result in results:
    print(result)

从网页上获取新闻的标题和链接

import re

import requests

resp = requests.get('https://www.sohu.com/')
content = resp.text

pattern1 = re.compile(r'href="http.+?"')
matcher = pattern1.search(content)
while matcher:
    print(matcher.group()[6:-1])
    matcher = pattern1.search(content, matcher.end())

pattern2 = re.compile(r'title=".+?"')
titles_list = pattern2.findall(content)
for title in titles_list:
    print(title[7:-1])

正则表达式捕获组

import re

import requests

# 匹配整个a标签，但是只捕获()中的内容 ---> 正则表达式的捕获组
pattern = re.compile(r'<a\s.*?href="(.+?)".*?title="(.+?)".*?>')
resp = requests.get('https://www.sohu.com/')
results = pattern.findall(resp.text)
for href, title in results:
    print(title)
    print(href)

不良内容过滤

import re

content = '马化腾是一个沙雕煞笔，FUck you！'
pattern = re.compile(r'[傻沙鲨煞][吊逼笔雕鄙]|马化腾|fuck|shit', flags=re.IGNORECASE)
# modified_content = re.sub(r'[傻沙煞][逼笔雕鄙]|马化腾|fuck|shit', '*', content, flags=re.I)
modified_content = pattern.sub('*', content)
print(modified_content)

拆分字符串

import re

poem = '窗前明月光，疑是地上霜。举头望明月，低头思故乡。'
pattern = re.compile(r'[，。]')
sentences_list = pattern.split(poem)
print(sentences_list)
sentences_list = [sentence for sentence in sentences_list if sentence]
print(sentences_list)
for sentence in sentences_list:
    print(sentence)

踏墟

关注

5
点赞
踩
0

收藏

觉得还不错? 一键收藏
打赏
0
评论
python学习的第二十六天：正则表达式（regular expression）

python学习的第二十六天：正则表达式（regular expression）对正则表达式的基础理解正则表达式 —> 模式 —> 匹配字符串的模式 —> 复杂的匹配规则字符集 —> [] —> [a-zA-Z0-9_]{6,20} —> \w{6,20} —> ^\w{6,20}$Python使用正则表达式的两种方式：不创建正则表达式对象，直接调用函数进行匹配操作matchfullmatch创建正则表达式对象（Pattern），通过给对象
复制链接

扫一扫