Python基础之正则表达式
本节将介绍Python中正则表达式最基本的用法,正则表达式本身不做太多介绍。
Python中正则表达式的内置模块是re,最基本的用法是判断某个字符串是否符合某个表达式,分组,找出一个字符串中所有符合某个表达式的列表。
判断字符串是否符合某个表达式
可通过search()函数和match()函数来实现,不同之处是match函数是从字符串的起始字符开始判断,而search函数是从任意位置开始判断。例如:
search:
import re
# Check if a string matches a regexp
str = "www.google.com"
match = re.search('g.*e', str)
print(match)
print(match.span())
print(str[match.start():match.end()])
print()
match = re.search('ag.*e', str)
if match:
print("Match")
else:
print("Not match")
print()
# Search and group
match = re.search('(g.*e)\.(com)', str)
if match:
print(match.group(1))
print(match.group(2))
else:
print("Not match.")
运行结果:
D:\work\python_workspace\python_study\venv\Scripts\python.exe D:/work/python_workspace/python_study/basic_11/search.py
<re.Match object; span=(4, 10), match='google'>
(4, 10)
google
Not match
google
com
Process finished with exit code 0
match:
import re
str = "www.google.com"
match = re.match("www", str)
print(match)
match = re.match("google", str)
print(match)
运行结果:
D:\work\python_workspace\python_study\venv\Scripts\python.exe D:/work/python_workspace/python_study/basic_11/match.py
<re.Match object; span=(0, 3), match='www'>
None
Process finished with exit code 0
找出字符串中所有符合某个表达式的列表
这也是一个非常有用的操作,在网络爬虫方面应用广泛。例如:
import re
str = "www.google.com, https://www.baidu.com/, www.qq.com, https://www.amazon.com"
result = re.findall('www\..*?\.com', str)
for r in result:
print(r)
运行结果:
D:\work\python_workspace\python_study\venv\Scripts\python.exe D:/work/python_workspace/python_study/basic_11/findall.py
www.google.com
www.baidu.com
www.qq.com
www.amazon.com
Process finished with exit code 0
当然,re模块中还有其它的函数,如split,sub等,由于不太常用,这里就不过多介绍。