Python3.7官方向导翻译之Python标准库向导2

第二部分涵盖了支持专业编程需求的更高级模块。这些模块很少出现在小脚本中

格式化输出

reprlib模块提供repr（）自定义的大版本或深度嵌套容器的缩略显示版本：

import reprlib
reprlib.repr(set('supercalifragilisticexpialidocious'))

"set(['a', 'c', 'd', 'e', 'f', 'g', ...])"

pprint模块以解释器可读的方式提供更复杂的控制，以打印内置和用户定义的对象。当结果长于一行时，“pretty printer”会添加换行符和缩进来更清晰地显示数据结构：

import pprint
t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta',
    'yellow'], 'blue']]]

pprint.pprint(t, width=30)

[[[['black', 'cyan'],
   'white',
   ['green', 'red']],
  [['magenta', 'yellow'],
   'blue']]]

textwrap模块将文本段落格式化以适应给定的屏幕宽度：

import textwrap
doc = """The wrap() method is just like fill() except that it returns
      a list of strings instead of one big string with newlines to separate
      the wrapped lines."""

print(textwrap.fill(doc, width=40))

The wrap() method is just like fill()
except that it returns       a list of
strings instead of one big string with
newlines to separate       the wrapped
lines.

locale模块访问文化特定数据格式的数据库。 local格式函数的分组属性提供了使用组分隔符格式化数字的直接方式：

import locale
locale.setlocale(locale.LC_ALL, 'English_United States.1252')

'English_United States.1252'

conv = locale.localeconv()             # get a mapping of conventions
x = 1234567.8
locale.format('%d', x, grouping = True)

'1,234,567'

locale.format_string("%s%.*f", (conv['currency_symbol'],
                     conv['frac_digits'], x), grouping=True)

'$1,234,567.80'

模板

string模块包含一个通用的Template类，其中包含适合最终用户编辑的简化语法。这使用户可以自定义他们的应用程序，而无需更改应用程序。

格式使用由$形成的占位符名称和有效的Python标识符（字母数字字符和下划线）。用花括号围住占位符，可以在后面加上更多的字母数字字母，而不需要插入空格。写$$创建一个单一的转义$

from string import Template
t = Template('${village}folk send $$10 to $cause.')
t.substitute(village='Nottingham', cause='the ditch fund')

'Nottinghamfolk send $10 to the ditch fund.'

当在字典或关键字参数中未提供占位符时，substitute（）方法会引发KeyError。对于邮件合并样式的应用程序，用户提供的数据可能不完整，safe_substitute（）方法可能更合适 - 如果数据丢失，它将使占位符保持不变：

t = Template(‘Return the $item to$ owner.’)
d = dict(item=’unladen swallow’)
t.substitute(d)

t.safe_substitute(d)

模板子类可以指定一个自定义分隔符。例如，照片浏览器的批量重命名实用程序可能会选择使用百分号来表示占位符，例如当前日期，图像序列号或文件格式：

import time, os.path
photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg']
class BatchRename(Template):
     delimiter = '%'
fmt = input('Enter rename style (%d-date %n-seqnum %f-format):  ')

Enter rename style (%d-date %n-seqnum %f-format): Ashley_%n%f

t = BatchRename(fmt)
date = time.strftime('%d%b%y')
for i, filename in enumerate(photofiles):
     base, ext = os.path.splitext(filename)
     newname = t.substitute(d=date, n=i, f=ext)
     print('{0} --> {1}'.format(filename, newname))```

img_1074.jpg –> Ashley_0.jpg
img_1076.jpg –> Ashley_1.jpg
img_1077.jpg –> Ashley_2.jpg“`

模板的另一个应用是将程序逻辑从多种输出格式的细节中分离出来。这使得可以将自定义模板替换为XML文件，纯文本报告和HTML Web报告。

使用二进制数据记录布局

结构模块提供了pack（）和unpack（）函数来处理可变长度的二进制记录格式。以下示例显示如何在不使用zipfile模块的情况下遍历ZIP文件中的标头信息。包装代码“H”和“I”分别代表两个和四个字节的无符号数字。 “<”表示它们是标准大小，并且是小尾数字节顺序：

import struct

with open('myfile.zip', 'rb') as f:
    data = f.read()

start = 0
for i in range(3):                      # show the first 3 file headers
    start += 14
    fields = struct.unpack('<IIIHH', data[start:start+16])
    crc32, comp_size, uncomp_size, filenamesize, extra_size = fields

    start += 16
    filename = data[start:start+filenamesize]
    start += filenamesize
    extra = data[start:start+extra_size]
    print(filename, hex(crc32), comp_size, uncomp_size)

    start += extra_size + comp_size     # skip to the next header

多线程

线程是一种解耦没有顺序依赖任务的技术。 线程可用于提高接受用户输入的应用程序的响应性，而其他任务在后台运行。一个相关的用例是IO的同事并且在另一个线程的计算。

以下代码显示了高级线程模块在主程序继续运行时如何在后台运行任务：

import threading, zipfile

class AsyncZip(threading.Thread):
    def __init__(self, infile, outfile):
        threading.Thread.__init__(self)
        self.infile = infile
        self.outfile = outfile

    def run(self):
        f = zipfile.ZipFile(self.outfile, 'w', zipfile.ZIP_DEFLATED)
        f.write(self.infile)
        f.close()
        print('Finished background zip of:', self.infile)

background = AsyncZip('mydata.txt', 'myarchive.zip')
background.start()
print('The main program continues to run in foreground')

background.join()       # Wait for the background task to finish
print('Main program waited until background was done')

The main program continues to run in foreground
Finished background zip of: mydata.txt
Main program waited until background was done

多线程应用程序的主要挑战是协调共享数据或其他资源的线程。为此，线程模块提供了许多同步原语，包括锁，事件，条件变量和信号量。

虽然这些工具功能强大，但小的设计错误可能会导致难以重现的问题。因此，任务协调的首选方法是将所有对资源的访问集中在单个线程中，然后使用queue模块向该线程提供来自其他线程的请求。使用Queue对象进行线程间通信和协调的应用程序更容易设计，更易读，更可靠

日志

logging模块提供了一个功能齐全且灵活的日志系统。最简单的日志消息发送到文件或sys.stderr：

import logging
logging.debug('Debugging information')
logging.info('Information message')
logging.warning('Warning:config file %s not found', 'server.conf')
logging.error('Error occured')
logging.critical('Critical error -- shutting dowm')

WARNING:root:Warning:config file server.conf not found
ERROR:root:Error occured
CRITICAL:root:Critical error -- shutting dowm

默认情况下，信息和调试消息被抑制，输出发送到标准错误。其他输出选项包括通过电子邮件，数据报，套接字或HTTP服务器路由消息。新的过滤器可以根据消息优先级选择不同的路由：DEBUG，INFO，WARNING，ERROR和CRITICAL。

日志记录系统可以直接从Python进行配置，也可以从用户可编辑的配置文件中加载，以进行自定义日志记录，而无需更改应用程序。

弱参考

Python会自动执行内存管理（对大多数对象和垃圾回收进行引用计数以消除循环）。内存在最后一次引用被删除后不久就释放。

这种方法适用于大多数应用程序，但偶尔需要跟踪对象，只要它们被别的东西使用。不幸的是，只跟踪它们会创建一个使其成为永久性的参考。 weakref模块提供了用于跟踪对象而不创建参考的工具。当不再需要该对象时，它将自动从弱参考表中移除，并为弱参考对象触发回调。典型的应用程序包括缓存创建昂贵的对象：

import weakref, gc
class A:
     def __init__(self, value):
         self.value = value
         def __repr__(self):
         return str(self.value)

a = A(10)                   # create a reference
d = weakref.WeakValueDictionary()
d['primary'] = a            # does not create a reference
d['primary']                # fetch the object if it is still alive

10

del a                       # remove the one reference
gc.collect()                # run garbage collection right away

0

d['primary']                # entry was automatically removed

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
    d['primary']                # entry was automatically removed
  File "C:/python36/lib/weakref.py", line 46, in __getitem__
    o = self.data[key]()
KeyError: 'primary'

搭配列表的工具

内置列表类型可满足许多数据结构需求。但是，有时候需要具有不同性能权衡的替代实现。

array模块提供一个array（）对象，就像一个只存储同质数据并将其存储更紧凑的列表。下面的例子显示了一个以两个字节的无符号二进制数（typecode“H”）存储的数字数组，而不是对于Python int对象的常规列表中通常的16个字节：

from array import array
a = array('H', [4000, 10, 700, 22222])
sum(a)

a[1:3]

array('H', [10, 700])

collections模块提供了一个deque（）对象，它像一个列表，更快的追加和从左侧弹出，但在中间查找速度较慢。这些对象非常适合实现队列和广度优先树搜索：

from collections import deque
d = deque(['task1', 'task2', 'task3'])
d.append('task4')
print('Handing', d.popleft())

Handing task1

unsearched = deque([starting_node])
def breadth_first_search(unsearched):
    node = unsearched.popleft()
    for m in gen_moves(node):
        if is_goal(m):
            return m
        unsearched.append(m)

除了替代列表实现之外，该库还提供了其他工具，例如带有用于处理排序列表的函数的bisect模块：

import bisect
scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')]
bisect.insort(scores, (300, 'ruby'))
scores

[(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')]

heapq模块提供了基于常规列表实现堆的功能。最低价值的条目始终保持在零位。这对于重复访问最小元素但不想运行完整列表排序的应用程序非常有用：

from heapq import heapify, heappop, heappush
data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
heapify(data)                      # rearrange the list into heap order
heappush(data, -5)                 # add a new entry
[heappop(data) for i in range(3)]  # fetch the three smallest entries

[-5, 0, 1]

十进制浮点算法

decimal模块提供十进制浮点运算的decimal数据类型。与二进制浮点的内置float实现相比，该类在下面的实现中特别有用：
1. 财务应用和其他需要精确十进制表示的用途，
2. 控制精度，
3. 控制四舍五入来满足法律或监管要求，
4. 跟踪重要的小数位，或
5. 用户期望结果与手工完成的计算相匹配的应用程序。

例如，对70美分手机充电器计算5％的税收会给出十进制浮点和二进制浮点的不同结果。如果结果四舍五入到最接近的百分数，则差异变得显着：

from decimal import *
round(Decimal('0.70') * Decimal('1.05'), 2)

Decimal('0.74')

round(.70 * 1.05, 2)

0.73

The Decimal result keeps a trailing zero, automatically inferring four place significance from multiplicands with two place significance. Decimal reproduces mathematics as done by hand and avoids issues that can arise when binary floating point cannot exactly represent decimal quantities.———–这个意思没看懂

精确表示使Decimal类能够执行模计算（modulo）和不适合二进制浮点的相等性测试：

Decimal('1.00') % Decimal('.10')

Decimal('0.00')

1.00 % 0.10

0.09999999999999995

sum([Decimal('0.1')]*10) == Decimal('1.0')

True

sum([0.1]*10) == 1.0

False

十进制模块提供了所需的精度：

getcontext().prec = 36
Decimal(1) / Decimal(7)

Decimal('0.142857142857142857142857142857142857')

1/7

0.14285714285714285