【python标准库-文本处理服务】string 模块

最新推荐文章于 2024-05-11 11:54:30 发布

tester_yangzai

最新推荐文章于 2024-05-11 11:54:30 发布

阅读量178

点赞数

分类专栏：标准库文章标签： python

原文链接：http://study.yali.edu.cn/pythonhelp/library/string.html

版权

标准库专栏收录该内容

1 篇文章 0 订阅

订阅专栏

一、字符串常量

常量	描述
string.ascii_letters	全量字母，由 ascii_lowercase 和 ascii_uppercase 常量的拼连
string.ascii_lowercase	小写字母 ‘abcdefghijklmnopqrstuvwxyz’
string.ascii_uppercase	大写字母 ‘ABCDEFGHIJKLMNOPQRSTUVWXYZ’
string.digits	字符串 ‘0123456789’
string.hexdigits	字符串 ‘0123456789abcdefABCDEF’
string.octdigits	字符串 ‘01234567’
string.punctuation	标点符号的 ASCII 字符所组成的字符串: !"#$%&'()*+,-./:;<=>?@[]^_`{
string.printable	由由被视为可打印符号的 ASCII 字符组成的字符串，是 digits, ascii_letters, punctuation 和 whitespace 总和
string.whitespace	由被视为空白符号的 ASCII 字符组成的字符串。其中包括空格、制表、换行、回车、进纸和纵向制表符

示例

import string
print("string.ascii_letters: ", string.ascii_letters)
print("string.ascii_lowercase: ", string.ascii_lowercase)
print("string.ascii_uppercase: ", string.ascii_uppercase)
print("string.digits: ", string.digits)
print("string.hexdigits: ", string.hexdigits)
print("string.octdigits: ", string.octdigits)
print("string.punctuation: ", string.punctuation)
print("string.printable: ", string.printable)
print("string.whitespace: ", string.whitespace)  # whitespace打印时会自动格式化，原字符为' \t\n\r\v\f'
>>> 
string.ascii_letters:  abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ
string.ascii_lowercase:  abcdefghijklmnopqrstuvwxyz
string.ascii_uppercase:  ABCDEFGHIJKLMNOPQRSTUVWXYZ
string.digits:  0123456789
string.hexdigits:  0123456789abcdefABCDEF
string.octdigits:  01234567
string.punctuation:  !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
string.printable:  0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ \t\n\r\v\f
string.whitespace:  \t\n\r\v\f

应用场景
随机生成字符串类型的变量等

二、自定义字符串格式化

Formatter类

与内置format()方法相同的实现来创建并定制自己的字符串格式化行为。

Formatter类公有方法

format(format_string, /, *args, **kwargs)
首要的 API 方法。它接受一个格式字符串和任意一组位置和关键字参数。它只是一个调用 vformat() 的包装器
vformat(format_string, args, kwargs)
此函数执行实际的格式化操作。它被公开为一个单独的函数，用于需要传入一个预定义字母作为参数，而不是使用 *args 和 **kwargs 语法将字典解包为多个单独参数并重打包的情况。 vformat() 完成将格式字符串分解为字符数据和替换字段的工作。它会调用下文所述的几种不同方法。
parse(format_string)
循环遍历 format_string 并返回一个由可迭代对象组成的元组 (literal_text, field_name, format_spec, conversion)。它会被 vformat() 用来将字符串分解为文本字面值或替换字段。

元组中的值在概念上表示一段字面文本加上一个替换字段。如果没有字面文本（如果连续出现两个替换字段就会发生这种情况），则 literal_text 将是一个长度为零的字符串。如果没有替换字段，则 field_name, format_spec 和 conversion 的值将为 None
get_field(field_name, args, kwargs)
给定 field_name 作为 parse() (见上文) 的返回值，将其转换为要格式化的对象。返回一个元组 (obj, used_key)。默认版本接受在 PEP 3101 所定义形式的字符串，例如 “0[name]” 或 “label.title”。 args 和 kwargs 与传给 vformat() 的一样。返回值 used_key 与 get_value() 的 key 形参具有相同的含义.
get_value(key, args, kwargs)
提取给定的字段值。 key 参数将为整数或字符串。如果是整数，它表示 args 中位置参数的索引；如果是字符串，它表示 kwargs 中的关键字参数名。

args 形参会被设为 vformat() 的位置参数列表，而 kwargs 形参会被设为由关键字参数组成的字典。

对于复合字段名称，仅会为字段名称的第一个组件调用这些函数；后续组件会通过普通属性和索引操作来进行处理。

因此举例来说，字段表达式 ‘0.name’ 将导致调用 get_value() 时附带 key 参数值 0。在 get_value() 通过调用内置的 getattr() 函数返回后将会查找 name 属性。

如果索引或关键字引用了一个不存在的项，则将引发 IndexError 或 KeyError
check_unused_args(used_args, args, kwargs)
在必要时实现对未使用参数进行检测。此函数的参数是是格式字符串中实际引用的所有参数键的集合（整数表示位置参数，字符串表示名称参数），以及被传给 vformat 的 args 和 kwargs 的引用。未使用参数的集合可以根据这些形参计算出来。如果检测失败则 check_unused_args() 应会引发一个异常。
format_field(value, format_spec)
format_field() 会简单地调用内置全局函数 format()。提供该方法是为了让子类能够重载它。
convert_field(value, conversion)
使用给定的转换类型（来自 parse() 方法所返回的元组）来转换（由 get_field() 所返回的）值。默认版本支持 ‘s’ (str), ‘r’ (repr) 和 ‘a’ (ascii) 等转换类型。

格式字符串语法

格式字符串包含有以花括号 {} 括起来的“替换字段”。不在花括号之内的内容被视为字面文本，会不加修改地复制到输出中。如果你需要在字面文本中包含花括号字符，可以通过重复来转义: {{ and }}

替换语法：

replacement_field ::=  "{" [field_name] ["!" conversion] [":" format_spec] "}"
field_name        ::=  arg_name ("." attribute_name | "[" element_index "]")*
arg_name          ::=  [identifier | digit+]
attribute_name    ::=  identifier
element_index     ::=  digit+ | index_string
index_string      ::=  <any source character except "]"> +
conversion        ::=  "r" | "s" | "a"
format_spec       ::=  <described in the next section>

field_name 指定要对值进行格式化并取代替换字符被插入到输出结果的对象。 field_name 之后有可选的 conversion 字段，它是一个感叹号 ‘!’ 加一个 format_spec，并以一个冒号 ‘:’ 打头。这些指明了替换值的非默认格式

示例：

"First, thou shalt count to {0}"  # References first positional argument
"Bring me a {}"                   # Implicitly references the first positional argument
"From {} to {}"                   # Same as "From {0} to {1}"
"My quest is {name}"              # References keyword argument 'name'
"Weight in tons {0.weight}"       # 'weight' attribute of first positional arg
"Units destroyed: {players[0]}"   # First element of keyword argument 'players'.

str.format() 语法与%格式化比较

按位置访问参数

>>> '{0}, {1}, {2}'.format('a', 'b', 'c')
'a, b, c'
>>> '{}, {}, {}'.format('a', 'b', 'c')  # 3.1+ only
'a, b, c'
>>> '{2}, {1}, {0}'.format('a', 'b', 'c')
'c, b, a'
>>> '{2}, {1}, {0}'.format(*'abc')      # unpacking argument sequence
'c, b, a'
>>> '{0}{1}{0}'.format('abra', 'cad')   # arguments' indices can be repeated

按名称访问参数:

>>> 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W')
'Coordinates: 37.24N, -115.81W'
>>> coord = {'latitude': '37.24N', 'longitude': '-115.81W'}
>>> 'Coordinates: {latitude}, {longitude}'.format(**coord)
'Coordinates: 37.24N, -115.81W'

访问参数的属性:

>>> c = 3-5j
>>> ('The complex number {0} is formed from the real part {0.real} '
...  'and the imaginary part {0.imag}.').format(c)
'The complex number (3-5j) is formed from the real part 3.0 and the imaginary part -5.0.'
>>> class Point:
...     def __init__(self, x, y):
...         self.x, self.y = x, y
...     def __str__(self):
...         return 'Point({self.x}, {self.y})'.format(self=self)
...
>>> str(Point(4, 2))
'Point(4, 2)'

访问参数的项:

>>> coord = (3, 5)
>>> 'X: {0[0]};  Y: {0[1]}'.format(coord)
'X: 3;  Y: 5'

替代 %s 和 %r:

>>> "repr() shows quotes: {!r}; str() doesn't: {!s}".format('test1', 'test2')
"repr() shows quotes: 'test1'; str() doesn't: test2"

对齐文本以及指定宽度:

>>> '{:<30}'.format('left aligned')
'left aligned                  '
>>> '{:>30}'.format('right aligned')
'                 right aligned'
>>> '{:^30}'.format('centered')
'           centered           '
>>> '{:*^30}'.format('centered')  # use '*' as a fill char
'***********centered***********'

替代 %+f, %-f 和 % f 以及指定正负号:

>>> '{:+f}; {:+f}'.format(3.14, -3.14)  # show it always
'+3.140000; -3.140000'
>>> '{: f}; {: f}'.format(3.14, -3.14)  # show a space for positive numbers
' 3.140000; -3.140000'
>>> '{:-f}; {:-f}'.format(3.14, -3.14)  # show only the minus -- same as '{:f}; {:f}'
'3.140000; -3.140000'

替代 %x 和 %o 以及转换基于不同进位制的值:

>>> # format also supports binary numbers
>>> "int: {0:d};  hex: {0:x};  oct: {0:o};  bin: {0:b}".format(42)
'int: 42;  hex: 2a;  oct: 52;  bin: 101010'
>>> # with 0x, 0o, or 0b as prefix:
>>> "int: {0:d};  hex: {0:#x};  oct: {0:#o};  bin: {0:#b}".format(42)
'int: 42;  hex: 0x2a;  oct: 0o52;  bin: 0b101010'

使用逗号作为千位分隔符:

>>> '{:,}'.format(1234567890)
'1,234,567,890'

表示为百分数:

>>> points = 19
>>> total = 22
>>> 'Correct answers: {:.2%}'.format(points/total)
'Correct answers: 86.36%'

使用特定类型的专属格式化:

>>> import datetime
>>> d = datetime.datetime(2010, 7, 4, 12, 15, 58)
>>> '{:%Y-%m-%d %H:%M:%S}'.format(d)
'2010-07-04 12:15:58'

嵌套参数以及更复杂的示例:

>>> for align, text in zip('<^>', ['left', 'center', 'right']):
...     '{0:{fill}{align}16}'.format(text, fill=align, align=align)
...
'left<<<<<<<<<<<<'
'^^^^^center^^^^^'
'>>>>>>>>>>>right'
>>>
>>> octets = [192, 168, 0, 1]
>>> '{:02X}{:02X}{:02X}{:02X}'.format(*octets)
'C0A80001'
>>> int(_, 16)
3232235521
>>>
>>> width = 5
>>> for num in range(5,12): 
...     for base in 'dXob':
...         print('{0:{width}{base}}'.format(num, base=base, width=width), end=' ')
...     print()
...
    5     5     5   101
    6     6     6   110
    7     7     7   111
    8     8    10  1000
    9     9    11  1001
   10     A    12  1010
   11     B    13  1011

三、模板字符串

模板字符串支持基于 $ 的替换，使用以下规则：

$$ 为转义符号；它会被替换为单个的 $。
$identifier 为替换占位符，它会匹配一个名为"identifier" 的映射键。在默认情况下，“identifier” 限制为任意 ASCII 字母数字（包括下划线）组成的字符串，不区分大小写，以下划线或 ASCII 字母开头。在 $ 字符之后的第一个非标识符字符将表明占位符的终结。
${identifier} 等价于 $identifier。当占位符之后紧跟着有效的但又不是占位符一部分的标识符字符时需要使用，例如 “${noun}ification”。

string.Template类

class string.Template(template)
初始化类时接受一个参数作为模板字符串
substitute(mapping={}, /, **kwds)
执行模板替换，返回一个新字符串。 mapping 为任意字典类对象，其中的键将匹配模板中的占位符。或者你也可以提供一组关键字参数，其中的关键字即对应占位符。当同时给出 mapping 和 kwds 并且存在重复时，则以 kwds 中的占位符为优先。
safe_substitute(mapping={}, /, **kwds)
类似于 substitute()，不同之处是如果有占位符未在 mapping 和 kwds 中找到，不是引发 KeyError 异常，而是将原始占位符不加修改地显示在结果字符串中。另一个与 substitute() 的差异是任何在其他情况下出现的 $ 将简单地返回 $ 而不是引发 ValueError。

此方法被认为“安全”，因为虽然仍有可能发生其他异常，但它总是尝试返回可用的字符串而不是引发一个异常。从另一方面来说，safe_substitute() 也可能根本算不上安全，因为它将静默地忽略错误格式的模板，例如包含多余的分隔符、不成对的花括号或不是合法 Python 标识符的占位符等等。
template 公有数据属性
这是作为构造器的 template 参数被传入的对象。一般来说，你不应该修改它，但并不强制要求只读访问

示例：


>>> from string import Template
>>> s = Template('$who likes $what')
>>> s.substitute(who='tim', what='kung pao')
'tim likes kung pao'
>>> d = dict(who='tim')
>>> Template('Give $who $100').substitute(d)
Traceback (most recent call last):
...
ValueError: Invalid placeholder in string: line 1, col 11
>>> Template('$who likes $what').substitute(d)
Traceback (most recent call last):
...
KeyError: 'what'
>>> Template('$who likes $what').safe_substitute(d)
'tim likes $what'