正则表达式常用场景(以python为例)
-
匹配邮箱地址:
import re email_pattern = r'^[\w.-]+@[a-zA-Z\d.-]+.[a-zA-Z]{2,}$' emails = ['abc@email.com', 'xyz123@email.co.uk', 'invalid-email'] valid_emails = [email for email in emails if re.match(email_pattern, email)] print(valid_emails) # 输出:['abc@email.com', 'xyz123@email.co.uk']
-
查找文本中的数字:
import re text = "这是一个包含数字123和456的文本" numbers = re.findall(r'\d+', text) print(numbers) # 输出:['123', '456']
-
替换文本中的特定模式:
import re text = "Hello, world! This is a test." updated_text = re.sub(r'Hello', 'Hi', text) print(updated_text) # 输出:Hi, world! This is a test.
-
提取网页中的链接:
import re html_content = '<a href="https://www.example.com">Example</a> <a href="https://www.test.com">Test</a>' links = re.findall(r'href="([^"]+)"', html_content) print(links) # 输出:['https://www.example.com', 'https://www.test.com']
-
提取手机号码(中国大陆手机号):
import re
phone_pattern = r'1[3-9]\d{9}'
text = "我的手机号是:13912345678,朋友的是18887654321。"
phones = re.findall(phone_pattern, text)
print(phones) # 输出:['13912345678', '18887654321']
6.匹配IP地址:
import re
ip_pattern = r'\b(?:[0-9]{1,3}.){3}[0-9]{1,3}\b'
text = "本机IP地址是:192.168.1.1,服务器IP是:10.0.0.1。"
ips = re.findall(ip_pattern, text)
print(ips) # 输出:['192.168.1.1', '10.0.0.1']
7.提取HTML标签内容(例如 <p>...</p>
):
import re
html_pattern = r'<(\w+)[^>]*>(.*?)</\1>'
html_text = '<p>Hello</p> <div>World</div>'
tags_content = re.findall(html_pattern, html_text)
print(tags_content) # 输出:[('p', 'Hello'), ('div', 'World')]
8.检测特定单词的出现次数:
import re
text = "This is a test. This test is important."
word = "test"
occurrences = len(re.findall(r'\b' + re.escape(word) + r'\b', text))
print(f"单词 '{word}' 出现次数:{occurrences}") # 输出:单词 'test' 出现次数:2
9.验证密码强度(包含大小写字母、数字和特殊字符):
import re
passwords = ['Pass123!', 'password', 'StrongP@ssword1']
password_pattern = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$'
strong_passwords = [pwd for pwd in passwords if re.match(password_pattern, pwd)]
print(strong_passwords) # 输出:['StrongP@ssword1']
结语
本文源码: Python爬虫之路 https://github.com/rosyrain/spider-course
lesson12中,有对应的python的demo文件。在lesson12当中还有正则表达式理论知识,常用场景、匹配内容。欢迎各位Follow/Star/Fork ( •̀ ω •́ )✧
有任何问题欢迎大家的评论和指正。再次声明,本专栏只做技术探讨,严谨商用,恶意攻击等。
这是我的 GitHub 主页:Rosyrain (github.com) https://github.com/rosyrain
,里面有一些我学习时候的笔记或者代码。本专栏的文档和源码存到spider-course的仓库下。
欢迎大家Follow/Star/Fork三连。