python 字符串和文本的一些操作

最新推荐文章于 2022-09-27 10:22:28 发布

xzccfzy

最新推荐文章于 2022-09-27 10:22:28 发布

阅读量109

点赞数

分类专栏： python 文章标签： python 字符串

本文链接：https://blog.csdn.net/xzccfzy/article/details/99717575

版权

python 专栏收录该内容

23 篇文章 0 订阅

订阅专栏

针对任意多的分隔符拆分字符串
python中分隔字符串有 split() 函数，但是如果字符串中包含不同分隔符的话，就需要用到 re.split() 函数。

>>> str = 'asdf fjdk; afed, fjek,asdf,    foo'
>>> import re
>>> re.split('[;,\s]\s*', str)
['asdf', 'fjdk', 'afed', 'fjek', 'asdf', 'foo']

在字符串的开头或结尾做文本匹配
python 中函数 str.startswith() 函数和 str.endswith() 函数可以直接对字符串进行开头结尾匹配。并且可以匹配多个值。

>>> str = 'a.txt'
>>> str.endswith('.txt')
True
>>> str.endswith('.py')
False
>>> str.startswith('a.')
True
>>> str.startswith('b.')
False

要匹配多个值的话，只需要给函数提供多个值组成的元组就可以了。

>>> str_list = ['a.txt', 'a.c', 'a.py', 'a.html', 'a.png', 'a.jpg', 'b.txt', 'b.c', 'b.py', 'b.html', 'b.png', 'b.jpg']
>>> [i for i in str_list if i.endswith(('.txt', '.c'))]
['a.txt', 'a.c', 'b.txt', 'b.c']
>>> [i for i in str_list if i.startswith(('a.t', 'b.t'))]
['a.txt', 'b.txt']

利用shell通配符做字符串匹配
linux中shell的通配符是很好用的，那如何在python中使用通配符对字符串进行匹配呢？fnmatch模块提供了两个函数 fnmatch() 和 fnmatchcase()

>>> str = 'aaa.txt'
>>> from fnmatch import fnmatch, fnmatchcase
>>> fnmatch(str, '*.txt')
True
>>> fnmatch(str, 'a*')
True

fnmatchcase()函数与 fnmatch() 函数的区别在于fnmatchcase()函数可以完全根据我们提供的大小写进行匹配。而不用因为系统环境的不同导致匹配出错。
4. 文本模式的匹配和查找
对于简单的文本匹配和查找，可以使用 str.find() str.startswith()等。不过对于复杂的文本匹配则需要用到 re 模块。因为re模块可以使用我们自己写的正则表达式进行匹配。

使用re模块进行匹配的步骤一般为用re.compile()对正则进行编译，然后使用比如 match(), findall(), finditer() 等方法进行匹配和查找。

>>> str = '19/08/2019'
>>> import re
>>> datepat = re.compile('\d+/\d+/\d+')
>>> print(datepat.match(str))
<re.Match object; span=(0, 10), match='19/08/2019'>

查找和替换文本
对于简单的文本替换，使用str.replace()函数即可实现。

>>> str = 'aaaaa,bbbbb,aaaaa'
>>> str.replace('bbbbb', 'aaaaa')
'aaaaa,aaaaa,aaaaa'

对于复杂的文本替换，则需要用到 re 模块中的 sub() 函数

>>> str = '19/08/2019'
>>> import re
>>> re.sub('[0-9]', '_', str)
'__/__/____'

不区分大小写对文本查找和替换
进行不区分大小写的文本操作时，需要使用 re 模块，并且各种操作都需要加上 re.IGNORECASE 标记。

>>> str = 'HELLO, hello, Hello'
>>> import re
>>> re.sub('hello', 'world', str, flags=re.IGNORECASE)
'world, world, world'

定义实现最短匹配的正则表达式
在对文本进行匹配的时候，有时会出现相匹配的短字符串在相匹配的长字符里。而我们有时需要最短匹配，有时需要最长匹配。比如:

>>> str = '"hello"'
>>> str_1 = '"hello", "world"'
>>> import re
>>> str_pat = re.compile('\"(.*)\"')
>>> str_pat.findall(str)
['hello']
>>> str_pat.findall(str_1)
['hello", "world']

在对 str_1 进行匹配的时候我们需要最短匹配，但是正则表达式 "(.*)" 中的 * 表示的是贪心策略，所以会匹配最长的结果，我们只需要将表达式变为 “(.*?)” 即可将最长匹配变为最短匹配。

>>> str_pat = re.compile('\"(.*?)\"')
>>> str_pat.findall(str)
['hello']
>>> str_pat.findall(str_1)
['hello', 'world']

从字符串中去掉不需要的字符
有时候从文本中读取的字符串开头结尾或者中间会包含空格，换行等字符，需要去掉。开头和结尾的可以使用 strip()函数进行处理，只处理开头使用 lstrip(), 只处理结尾使用 rstrip()。字符串中间的使用 replace()。

>>> str = ' hello,   world   \n'
>>> str.strip()
'hello,   world'
>>> str.lstrip()
'hello,   world   \n'
>>> str.rstrip()
' hello,   world'
>>> str.replace(' ', '')
'hello,world\n'

xzccfzy

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 字符串和文本的一些操作

针对任意多的分隔符拆分字符串python中分隔字符串有 split() 函数，但是如果字符串中包含不同分隔符的话，就需要用到 re.split() 函数。>>> str = 'asdf fjdk; afed, fjek,asdf, foo'>>> import re>>> re.split('[;,\s]\s*', str)...
复制链接

扫一扫