Python描述数据结构之字符串篇

最新推荐文章于 2021-02-03 02:49:50 发布

夏小悠

最新推荐文章于 2021-02-03 02:49:50 发布

阅读量1.2k

点赞数 4

分类专栏：数据结构文章标签： python 字符串

本文链接：https://blog.csdn.net/qq_42730750/article/details/107931738

版权

数据结构专栏收录该内容

32 篇文章 21 订阅

订阅专栏

前言

字符串通常被称为串，前面博客也提到过，它是Python的一种基本数据类型，所以在本篇博文不再对字符串做过的叙述，而是整理一下Python中内置的对字符串的操作方法。

1. count()

$s t r . c o u n t (s u b [, s t a r t [, e n d]])$
统计子字符串 $s u b$ 在 $[s t a r t, e n d]$ 范围内出现的次数。可选参数 $s t a r t$ 与 $e n d$ 是切片表示法。

	str_x = 'abcdbbefg'
    print(str_x.count('bb'))
>>>>1

2. encode()

$s t r . e n c o d e (e n c o d i n g = " u t f - 8 ", e r r o r s = " s t r i c t ")$
对字符串按指定编码进行编码。默认编码为 $^{'} u t f - 8^{'}$ 。可以通过 $e r r o r s$ 来设置不同的错误处理方案， $e r r o r s$ 的默认值为 $^{'} s t r i c t^{'}$ ，表示编码错误会引发 $U n i c o d e E r r o r$ ，其他可用的值为 $^{'} i g n o r e^{'}$ ， $^{'} r e p l a c e^{'}$ 。

	str_x = 'abcdbbefg'
    print(str_x.encode())
>>>>b'abcdbbefg'

3. decode()

$b y t e s . d e c o d e (e n c o d i n g =^{'} U T F - 8^{'}, e r r o r s =^{'} s t r i c t^{'})$
对字符串按指定编码进行解码。正好与 $e n c o d e ()$ 是相反的操作。

	str_x = 'abcdbbefg'
    print(str_x.encode(encoding='utf-8'))
    print(str_x.encode(encoding='utf-8').decode(encoding='utf-8'))
>>>>b'abcdbbefg'
>>>>abcdbbefg

4. format()

$s t r . f o r m a t (* a r g s, * * k w a r g s)$
执行字符串格式化操作。
格式： ${<参数序号>:$ $<格式控制符>\}$ 。其中，格式控制标记用来控制参数显示时的格式，格式内容如下：

:	<填充>	<对齐>	<宽度>	< , >	< .精度 >	<类型>
引导符号	用于填充的单个字符，默认为空格	< 左对齐 > 右对齐 ^ 居中	输出宽度	数字的千位分隔符，适用于整数和浮点数	浮点数小数部分的精度或字符串的最大输出长度	整数类型，b，c，d，o，x，X，浮点数类型，e，E，f，%

	print("{:*^15}".format('abcdefg'))
    print("{:,}".format(1000000))
    print("{:.2f}".format(3.1415926))
    print("{:.2%}".format(3.1415926))
    print("{:.2e}".format(1000000))
    print("{:b}".format(5))
>>>>****abcdefg****
>>>>1,000,000
>>>>3.14
>>>>314.16%
>>>>1.00e+06
>>>>101

* $a r g s$ 表示任何多个无名参数，它是一个 $t u p l e$ ；** $k w a r g s$ 表示关键字参数，它是一个 $d i c t$ 。如果同时使用* $a r g s$ 和** $k w a r g s$ 时，* $v a r g s$ 参数列必须要放在** $k w a r g s$ 前面。

5. find()

$s t r . f i n d (s u b [, s t a r t [, e n d]])$
查找子字符串 $s u b$ 在 $s t r [s t a r t : e n d]$ 切片内的最小索引。如果 $s u b$ 未被找到则返回 -1，可选参数 $s t a r t$ 与 $e n d$ 是切片表示法。

	str_x = 'abcdbbefg'
    print(str_x.find('b'))
>>>>1

6. index()

$s t r . i n d e x (s u b [, s t a r t [, e n d]])$
类似于 $f i n d ()$ ，但在找不到子类时会引发 $V a l u e E r r o r$ 。

7. endswith()

$s t r . e n d s w i t h (s u f f i x [, s t a r t [, e n d]])$
如果字符串以指定的 $s u f f i x$ 结尾，则返回 $T r u e$ ，否则返回 $F a l s e$ 。 $s u f f i x$ 也可以为由多个供查找的后缀构成的元组。可选参数 $s t a r t$ 与 $e n d$ 是切片表示法。

	str_x = 'abcdbbefg'
    print(str_x.endswith(('a', 'g')))
>>>>True

8. startswith()

$s t r . s t a r t s w i t h (p r e f i x [, s t a r t [, e n d]])$
如果字符串以指定的 $s u f f i x$ 开始，则返回 $T r u e$ ，否则返回 $F a l s e$ 。用法同 $e n d s w i t h ()$ 。

9. join()

$s t r . j o i n (i t e r a b l e)$
将序列中的元素以指定的字符连接，生成一个新的字符串。

	str_list = ['abd', 'cca', 'befg']
    print(' '.join(str_list))
>>>>abd cca befg

10. split()

$s t r . s p l i t (s e p = N o n e, m a x s p l i t = - 1)$
以 $s e p$ (默认为所有的空字符，包括空格、换行(\n)、制表符(\t)等)进行分割字符串，返回分割后的列表， $m a x s p l i t$ 是分割次数，如果未指定，则进行所有可能的拆分。

	str_x = 'this is a pig'
    print(str_x.split())
    print(str_x.split(' '))
    print(str_x.split(' ', maxsplit=1))
>>>>['this', 'is', 'a', 'pig']
>>>>['this', 'is', 'a', 'pig']
>>>>['this', 'is a pig']

11. strip()

$s t r . s t r i p ([c h a r s])$
移除其中的首部和末尾字符。 $c h a r s$ 参数为指定要移除字符的字符串，默认移除空格符。实际上 $c h a r s$ 参数并非指定单个前缀或后缀，而是会移除参数值的所有组合。

	str_x = ' this is a pig*  '
    print(str_x.strip())
    print(str_x.strip('*'))
    print(str_x.strip('* '))
>>>>this is a pig*
>>>> this is a pig*  
>>>>this is a pig

12. replace()

$s t r . r e p l a c e (o l d, n e w [, c o u n t])$
将所有子字符串 $o l d$ 替换为 $n e w$ 。如果给出了可选参数 $c o u n t$ ，则只替换前 $c o u n t$ 次出现的。

	str_x = 'this is a pig'
    print(str_x.replace('pig', 'tiger'))
>>>>this is a tiger

13. zfill()

$s t r . z f i l l (w i d t h)$
在字符串左边填充 $^{'} 0^{'}$ 使其长度变为 $w i d t h$ 。正负值前缀 $(^{'} +^{'},^{'} -^{'})$ 的处理方式是在正负符号之后填充而非在之前。如果 $w i d t h$ 小于等于 $l e n (s t r)$ ，则返回原字符串的副本。

	str_x = '999'
    print(str_x.zfill(5))
>>>>00999

14. lower()

$s t r . l o w e r ()$
将大写的字符均转换为小写。

	str_x = 'this Is A pig'
    print(str_x.lower())
>>>>this is a pig

15. upper()

$s t r . u p p e r ()$
将小写的字符均转换为大写。

	str_x = 'this Is A pig'
    print(str_x.upper())
>>>>THIS IS A PIG

16. capitalize()

$s t r . c a p i t a l i z e ()$
返回原字符串的副本，其首个字符大写，其余为小写。

	str_x = 'abcdefg'
    print(str_x.capitalize())
    print(str_x)
    
>>>>Abcdefg
>>>>abcdefg

17. isalnum()

$s t r . i s a l n u m ()$
检测字符串是否只由字母或数字组成，如果是返回 $T r u e$ ，否则返回 $F a l s e$ 。

	str_x = 'abc9befg'
    print(str_x.isalnum())
    str_x = 'abc9 befg'
    print(str_x.isalnum())
>>>>True
>>>>False

18. isalpha()

$s t r . i s a l p h a ()$
检测字符串是否只由字母或中文组成，如果是返回 $T r u e$ ，否则返回 $F a l s e$ 。

	str_x = 'abc9befg'
    print(str_x.isalpha())
    str_x = 'abc李befg'
    print(str_x.isalpha())
>>>>False
>>>>True

19. isdigit()

$s t r . i s d i g i t ()$
检测字符串是否只由数字组成，如果是返回 $T r u e$ ，否则返回 $F a l s e$ 。

	str_x = 'abc9befg'
    print(str_x.isdigit())
>>>>False

20. islower()

$s t r . i s l o w e r ()$
检测字符串所有字母是否都为小写，如果是返回 $T r u e$ ，否则返回 $F a l s e$ 。

	str_x = 'abc3456befg'
    print(str_x.islower())
>>>>True

21. isupper()

$s t r . i s u p p e r ()$
检测字符串所有的字母是否都为大写，如果是返回 $T r u e$ ，否则返回 $F a l s e$ 。

	str_x = 'Abd Dihgj'
    print(str_x.isupper())
>>>>False

22. isspace()

$s t r . i s s p a c e ()$
检测字符串是否只由空格组成，如果是返回 $T r u e$ ，否则返回 $F a l s e$ 。

	str_x = 'abc34 56befg'
    print(str_x.isspace())
>>>>False

23. istitle()

$s t r . i s t i t l e ()$
检测字符串中所有的单词拼写首字母是否为大写，且其他字母为小写，如果是返回 $T r u e$ ，否则返回 $F a l s e$ 。

	str_x = 'Abd Dihgj'
    print(str_x.istitle())
>>>>True

24. title()

$s t r . t i t l e ()$
返回原字符串的标题版本，其中每个单词第一个字母为大写，其余字母为小写。

	str_x = 'this is a pig'
    print(str_x.title())
>>>>This Is A Pig

25. swapcase()

$s t r . s w a p c a s e ()$
将大写字母转换为小写，小写字母转换为大写。

	str_x = 'this iS a Pig'
    print(str_x.swapcase())
>>>>THIS Is A pIG

26. splitlines()

$s t r . s p l i t l i n e s ([k e e p e n d s])$
按照行( $^{'}$ \r $^{'}$ , $^{'}$ \r\n $^{'}$ , $^{'}$ \n $^{'}$ )分隔，返回一个包含各行作为元素的列表，如果参数 $k e e p e n d s$ 为 $F a l s e$ ，不包含换行符，如果为 $T r u e$ ，则保留换行符。

27. maketrans()

$s t r . m a k e t r a n s (x [, y [, z]])$
用于创建字符映射的转换表，如果只有一个参数 $x$ ，则 $x$ 必须是字典；如果有两个参数 $x$ 和 $y$ ， $x$ 、 $y$ 必须是长度相等的字符串；如果有三个参数 $x$ 、 $y$ 、 $z$ ，第三个参数 $z$ 必须是字符串，其字符将被映射为 $N o n e$ ，即删除该字符。

28. translate()

$s t r . t r a n s l a t e (t a b l e)$
根据参数 $t a b l e$ 给出的映射表(通过 $m a k e t r a n s ()$ 方法转换而来)。

	str_x = 'this iS a Pig'
    table = str_x.maketrans('abcdefg', '1234567')
    result = str_x.translate(table)
    print(result)
>>>>this iS 1 Pi7

29. center()

$s t r . c e n t e r (w i d t h [, f i l l c h a r = N o n e])$
返回长度为 $w i d t h$ 的字符串，原字符串在其正中。使用指定的 $f i l l c h a r$ 填充两边的空位(默认使用空格填充)。如果 $w i d t h$ 小于等于 $l e n (s t r)$ 则返回原字符串的副本。

30. ljust()

$s t r . l j u s t (w i d t h [, f i l l c h a r])$
返回长度为 $w i d t h$ 的字符串，原字符串在其中靠左对齐。使用指定的 $f i l l c h a r$ 填充空位 (默认使用空格)。

31. rjust()

$s t r . r j u s t (w i d t h [, f i l l c h a r])$
返回长度为 $w i d t h$ 的字符串，原字符串在其中靠右对齐。使用指定的 $f i l l c h a r$ 填充空位 (默认使用空格)。

	str_x = 'abcdefg'
    print(str_x.center(15, '*'))
    print(str_x.ljust(15, '*'))
    print(str_x.rjust(15, '*'))
>>>>****abcdefg****
>>>>abcdefg********
>>>>********abcdefg

32. lstrip()

$s t r . l s t r i p ([c h a r s])$
删除字符串左边的空格或指定字符。参数同 $s t r i p ()$ 。

33. rstrip()

$s t r . r s t r i p ([c h a r s])$
删除字符串右边的空格或指定字符。参数同 $s t r i p ()$ 。

34. rfind()

$s t r . r f i n d (s u b [, s t a r t [, e n d]])$
从右边开始查找，参数同 $f i n d ()$ 。

35. rindex()

$s t r . r i n d e x (s u b [, s t a r t [, e n d]])$
从右边开始查找，参数同 $i n d e x ()$ 。

36. rsplit()

$s t r . r s p l i t (s e p = N o n e, m a x s p l i t = - 1)$
从右边开始分割，参数同 $s p l i t ()$ 。

更多方法请参考官方文档。

结束语

根据平时使用的情况，做一个简单小结：平时做爬虫，不同的文本需要进行不同的编/解码，常用 $e n c o d e ()$ 和 $d e c o d e ()$ 方法，如果涉及到加/解密还会用到其他函数库，这些以后在Python爬虫专栏里面说；对字符进行格式化输出、分割、替换字符等等，常用 $f o r m a t ()$ 、 $s p l i t ()$ 、 $r e p l a c e ()$ 等；将列表转换成字符串，常用 $j o i n ()$ ；对文本进行预处理，删除换行符时，常用 $s t r i p ()$ ；给文件统一按序号命名，常用 $z f i l l ()$ ；其他对字符串的处理，比如匹配，常用的是正则表达式，即 $r e$ 库。