PythonCookBook chapter-02-字符串-CSDN博客

本文链接：https://blog.csdn.net/vitas_fly/article/details/80243204

（python3）

1,利用re模块的split(),字符串拆分

import re

str = 'hello world, my name is    leon!'
# \s: 匹配任何空白字符，包括空格、制表符、换页符等等,等价于 [ \f\n\r\t\v]。
# *: 匹配前面的子表达式零次或多次。
#[]: 字符集合。匹配所包含的任意一个字符
ret = re.split(r'[\s,]\s*', str)
print(ret)

addr = 'www.hao123.com'
# .: 匹配除换行符 \n 之外的任何单字符。要匹配 . ，用 \.
ret = re.split(r'\.', addr)
print(ret)

输出：

['hello', 'world', 'my', 'name', 'is', 'leon!']
['www', 'hao123', 'com']

捕获组：点击打开链接

用到捕获组，匹配的文本也会包含在结果中；不要捕获组，以(?:...)指定

2，字符串开头或结尾匹配

>>> filename = 'test.py'
>>> filename.endswith('.py')
True
>>> filename.startswith('te')
True

#正则表达式匹配字符串开头或结尾
>>> import re
>>> re.findall(r'^te', filename)
['te']
>>> re.match(r'^te', filename)
<_sre.SRE_Match object; span=(0, 2), match='te'>
>>> re.match(r'.py$', filename)
>>> re.match(r'*?.py$', filename)
>>> re.search(r'py$', filename)
<_sre.SRE_Match object; span=(5, 7), match='py'>

3，文本模式匹配和查找

简单文字匹配可以通过find(), findall(), startswit(),endswith()等函数匹配，复杂匹配可以通过正则表达式进行匹配

# 时间匹配
>>> timepat = re.compile(r'\d{1,2}:\d{1,2}:\d{1,2}')
>>> contents = "Now time is 12:00:00, not 188:18:19"
>>> timepat.findall(contents)['12:00:00']
>>> 
>>> # 网址匹配
>>> address = "www.baidu.com www.edu.cn www.open.org ww.xx"
>>> urlpat = re.compile(r'w{3}\.\w+(?:\.cn|\.com)')
>>> urlpat.findall(address)
['www.baidu.com', 'www.edu.cn']

4，替换，简单字符串可利用replace()，复杂模式可以用sub()或subn(),subn()可以返回替换次数

>>> str = "Hello, leon, this is C world!"
>>> str.replace('C', 'Python')
'Hello, leon, this is Python world!'

>>> str = "I graduated in 2011-07-01."
>>> import re
>>> re.sub(r'(\d{1,4})-(\d{1,2})-(\d{1,2})', r'\3/\2/\1', str)
'I graduated in 01/07/2011.'

5，忽略大小写可以加参数re.I,正则表达式多行的加参数re.X, 字符串是多行的加参数re.M

>>> import re
>>> str = 'python, Python, PyThon'
>>> re.findall('python', str, re.I) #忽略大小写匹配
['python', 'Python', 'PyThon']
>>> 
>>> 
>>> date = r"""
	\d+
	-
	\d+
	"""
>>> re.findall(date, "Today is 05-10")
[]
>>> re.findall(date, "Today is 05-10", re.X) #正则表达式是多行
['05-10']
>>>
>>>
>>> str ="""
Whateever is
worth doing
is worth
doing well
"""
>>> re.findall(r'worth', str)
['worth', 'worth']
>>> re.findall(r'^worth', str) # ^$这两个匹配默认只匹配第一行，只有加re.M参数才多行匹配
[]
>>> str
'\nWhateever is\nworth doing\nis worth\ndoing well\n'
>>> re.findall(r'^worth', str, re.M)
['worth']
>>>
>>>
>>> str = """a
b
c"""
>>> re.findall(r'a.b.c', str)
[]
>>> re.findall(r'a.b.c', str, re.S)#re.S会匹配换行符，默认是不匹配换行符的
['a\nb\nc']
>>>

6, 去掉不需要的字符

去掉两端字符用strip(),从左或从右去掉字符用lstrip()或rstrip(),去掉所有可以用replace()或re.sub()

其他还可参考点击打开链接

>>> str = '+++hello world+++'
>>> print(str.strip('+'))
hello world
>>> print(str.lstrip('+'))
hello world+++
>>> print(str.rstrip('+'))
+++hello world
>>> str = '+++hello ++ world+++'
>>> print(str.strip('+'))
hello ++ world
>>> print(str.lstrip('+'))
hello ++ world+++
>>> print(str.rstrip('+'))
+++hello ++ world
>>> print(str.replace('+', ''))
hello  world