python-字符串处理

最新推荐文章于 2024-06-06 00:30:00 发布

爱吃水饺的小京

最新推荐文章于 2024-06-06 00:30:00 发布

阅读量334

点赞数 1

分类专栏： python 文章标签： python 开发语言

本文链接：https://blog.csdn.net/ljsykf/article/details/123955844

版权

python 专栏收录该内容

49 篇文章 1 订阅

订阅专栏

文本模式的匹配和查找

str.find/str.endswith/str.startswith

如果只是匹配简单的文字，使用字符串的find,endswith,startswith这些方法即可

text='yeah,but no,but yea,but no,but yeah'

print(text.endswith('yeah'))

print(text.startswith("no"))

#返回第一个匹配的索引
print(text.find("no"))

运行结果：

True
False
9

使用re匹配和查找

例如：查找匹配11/27/2012类似格式的日期

使用findall找到所有匹配的字符串

import re

text='Today is 11/27/2021, Pycon starts 3/13/2013'

#能够匹配日期的正则表达式
pattern='\d+/\d+/\d+'

pa=re.compile(pattern)  #返回一个Pattern对象
match=pa.findall(text)  #返回的是一个匹配到的
print(match)

运行结果：

['11/27/2021', '3/13/2013']

使用match在字符串的开头匹配

pattern.match在字符串开头匹配，如果匹配到，则返回一个Match对象，如果没有匹配，则返回None

import re

text='11/27/2021, Pycon starts 3/13/2013'
text2='Today is 11/27/2021, Pycon starts 3/13/2013'

#能够匹配日期的正则表达式
pattern='\d+/\d+/\d+'

pa=re.compile(pattern)  #返回一个Pattern对象
match=pa.match(text)  #返回一个Match对象
print(match.group(0))  #group(0)是指整个匹配到的字符串
match2=pa.match(text2) #因为在字符串的开头没有匹配到正则表达式，所以返回None
print(match2)

运行结果：

11/27/2021
None

查找和替换

replace

简单的文本替换可以使用str的replace方法

text='yeah,but no,but yeah,but no,but yeah'

print(text.replace("yeah","yep"))

运行结果

yep,but no,but yep,but no,but yep

sub

对于复杂，可以使用re的sub方法
re.sub(pattern,repl,string),pattern是要匹配的模式，repl是要替换的模式，可以是字符串，也可以是一个回调函数

import re

text='''today出现的次数1，
yesterday出现的次数2，
tomorrow出现的次数3
'''

#定义一个回调函数，参数是match
def subFunction(match):
    number=match.group(1)
    replacenumber=int(number)+1
    return "次数"+str(replacenumber)

newtext=re.sub(r"次数(\d)",subFunction,text)
print(newtext)

运行结果：

today出现的次数2，
yesterday出现的次数3，
tomorrow出现的次数4

不区分大小写对文本替换

使用re.IGNORECASE或者re.I

import re

text='UPPER PYTHON,lower python,Mixed Python'

new_text=re.sub("python","snake",text,flags=re.I)

print(new_text)

运行结果：

UPPER snake,lower snake,Mixed snake

贪婪匹配和非贪婪匹配

import re

text='Computer says "no." Phone says "yes."'

new_text1=re.findall(r'\"(.*)\"',text)
print(new_text1)

#加上问号，非贪婪模式，可以提取出no.,yes.
new_text2=re.findall(r'\"(.*?)\"',text)
print(new_text2)

运行结果：

['no." Phone says "yes.']
['no.', 'yes.']

从字符串中去掉不需要的字符

去掉开始和结尾的字符串

strip去掉字符串开始和结尾的字符
lstrip去掉字符串左边的字符
rstrip去掉字符串右边的字符
默认是去掉空格，也可以指定其他字符

text1=' hello world \n'

#去掉字符串两边的空格
print(text1.strip())

text2="----hello======"
#去掉左边的-
print(text2.lstrip("-"))
#去掉右边的=
print(text2.rstrip("="))
#去掉两边的-或者=
print(text2.strip("-|="))

运行结果：

hello world
hello======
----hello
hello

去掉字符串中间的字符

strip方法只能去掉字符串两边的字符，如果要去掉字符串中间的字符，可以使用replace或者re的sub方法

#去掉中间的空格
import re

text="hello   world"

print(text.replace(' ',''))

new_text=re.sub("\s",'',text)
print(new_text)

运行结果：

helloworld
helloworld

拆分字符串

split

简单的拆分字符串，使用字符串的split，但是split不支持多个分隔符

#用逗号拆分字符串
text="hello,world,python"

print(text.split(","))

运行结果：

['hello', 'world', 'python']

re.split

支持多个分隔符

#拆分出one,two,three,for,five，其他字符不要
import re

text="one, two; three,for;    five"

new_text=re.split(r"[,;\s]\s*",text)
print(new_text)

运行结果：

['one', 'two', 'three', 'for', 'five']

捕获组

如果正则表达式中使用了括号()，就是捕获组
从匹配的正则表达式里提取括号的内容
组0，是整个匹配到的正则表达式
组1，是第一个括号，后面依次类推

import re

text='''橙子，橙子是橙色的;
香蕉，香蕉是黄色的;
西瓜，西瓜是绿色的
'''

pattern=re.compile(r"(.*)，")

new=pattern.findall(text)

print(new)

运行结果：

['橙子', '香蕉', '西瓜']

import re

text='''橙子，橙子是黄色的；
苹果，苹果是红色的；
西瓜，西瓜是绿色的
'''

pattern=re.compile(r'(.*)，')

#返回一个Match对象,search只返回第一个匹配到的字符串
match=pattern.search(text)

print(type(match))

print(match.group())
print(match.group(1))

运行结果：

<class 're.Match'>
橙子，
橙子

如果有多个匹配组，返回的是一个tuple

import re

text='Today is 11/27/2012, Pycon starts 3/13/2013.'

pattern=re.compile(r"(\d+)/(\d+)/(\d+)")

#findall，以列表的形式返回符合条件的字符串，如果有多个捕获组，返回一组元组
find_all=pattern.findall(text)
print(find_all)

#search匹配第一个符合的字符串,返回一个Match对象
search=pattern.search(text)
print(search.group())
print("捕获组1是%s,捕获组2是%s,捕获组3是%s"%(search.group(1),search.group(2),search.group(3)))

#finditer，找出所有符合条件的Match对象
match_all=pattern.finditer(text)
print(type(match_all))
for match in match_all:
    print("捕获组0是%s,捕获组1是%s,捕获组2是%s,捕获组3是%s"%(match.group(),match.group(1),match.group(2),match.group(3)))

运行结果：

[('11', '27', '2012'), ('3', '13', '2013')]
11/27/2012
捕获组1是11,捕获组2是27,捕获组3是2012
<class 'callable_iterator'>
捕获组0是11/27/2012,捕获组1是11,捕获组2是27,捕获组3是2012
捕获组0是3/13/2013,捕获组1是3,捕获组2是13,捕获组3是2013

split与捕获组

如果正则表达式中有捕获组，那么匹配的文本也在最终结果中

import re
text="one; tow, three;  for"

#符合条件的捕获组也会在返回的list中
new_text=re.split("(,|;|\s)\s*",text)

print(new_text)

#去掉捕获组
new_text2=re.split(r"(?:,|;|\s)\s*",text)
print(new_text2)

运行结果：

['one', ';', 'tow', ',', 'three', ';', 'for']
['one', 'tow', 'three', 'for']

sub与捕获组

对于捕获组，可以使用\number获取捕获组

#将日期11/27/2012，替换成2012-11-27
import re
text='Today is 11/27/2012.Pycon starts 3/13/2013'

new_text=re.sub(r"(\d+)/(\d+)/(\d+)",r"\3-\1-\2",text)
print(new_text)

运行结果：

Today is 2012-11-27.Pycon starts 2013-3-13

#将日期11/27/2012替换成27 Nov 2012的格式
import re
from calendar import month_abbr
text='Today is 11/27/2012. Pycon starts 3/13/2012'

#参数是一个Match对象
def change_m(m):
    month_name=month_abbr[int(m.group(1))]
    return '{} {} {}'.format(m.group(2),month_name,m.group(3))

pattern=re.compile(r'(\d+)/(\d+)/(\d+)')

new_text=pattern.sub(change_m,text)
print(new_text)

运行结果：

Today is 27 Nov 2012. Pycon starts 13 Mar 2012

爱吃水饺的小京

关注

1
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
python-字符串处理

文本模式的匹配和查找str.find/str.endswith/str.startswith如果只是匹配简单的文字，使用字符串的find,endswith,startswith这些方法即可text='yeah,but no,but yea,but no,but yeah'print(text.endswith('yeah'))print(text.startswith("no"))#返回第一个匹配的索引print(text.find("no"))运行结果：TrueFalse9
复制链接

扫一扫

专栏目录