Python 学习第二册第10章标准库的模块-CSDN博客

本文链接：https://blog.csdn.net/weixin_38135241/article/details/139619440

-----用教授的方法学习

10.1 模块

10.1 模块

10.1.1模块就是程序

模块是用于定义变量、函数、类等。假设你编写了以下的程序，并将其保存在文件hello.py中，这个文件的名称（不包括扩展名.py）将成为模块的名称。

# hello.py 
  print("Hello, world!")

10.1.2 模块是用来下定义的

1. 在模块中定义函数

假设你编写了一个类似于以下所示的模块，并将其存储在文件hello2.py中。

# hello2.py 
def hello(): 
   print("Hello, world!")

现在可以像下面这样导入它：

>>> import hello2

也可以

>>> hello2.hello()

Hello, world!

2. 在模块中添加测试代码

如果要确认函数hello管用，你可能将模块hello2重写为代码模块hello3。

# hello3.py 
def hello(): 
    print("Hello, world!") 
# 一个测试：
hello()

这看似合理：如果将这个模块作为普通程序运行，将发现它运行正常。然而，如果在另一个程序中将其作为模块导入，以便能够使用函数hello，也将执行测试代码，就像本章的第一个hello模块一样。

>>> import hello3

Hello, world!

>>> hello3.hello()

Hello, world!

需要使用变量__name__。

>>> __name__

'__main__'

>>> hello3.__name__

'hello3'

要让模块中测试代码的行为更合理，可将其放在一条if语句中

# hello4.py 
def hello(): 
    print("Hello, world!") 
def test(): 
    hello() 
if __name__ == '__main__': test()

导入它，其行为将像普通模块一样

>>> import hello4

>>> hello4.hello()

Hello, world!

>>> hello4.test()

Hello, world!

10.1.3 让模块可用

1. 将模块放在正确的位置

将模块放在正确的位置很容易，只需找出Python解释器到哪里去查找模块，再将文件放在这个地方即可。

>>> import sys, pprint

>>> pprint.pprint(sys.path)

这里的要点是，每个字符串都表示一个位置，如果要让解释器能够找到模块，可将其放在其中任何一个位置中。虽然放在这里显示的任何一个位置中都可行，但目录site-packages是最佳的选择，因为它就是用来放置模块的。只要模块位于类似于site-packages这样的地方，所有的程序就都能够导入它。

2. 告诉解释器到哪里去查找

模块放在其他地方，就必须告诉解释器到哪里去查找。标准做法是将模块所在的目录包含在环境变量PYTHONPATH中。

环境变量

环境变量并不是Python解释器的一部分，而是操作系统的一部分。大致而言，它们类似于Python变量，但是在Python解释器外面设置的。如果你使用的是bash shell（在大多数类UNIX系统、macOS和较新的Windows版本中都有），就可使用如下命令将~/python附加到环境变量PYTHONPATH末尾：

export PYTHONPATH=$PYTHONPATH:~/python

如果要对所有启动的shell都执行这个命令，可将其添加到主目录中的.bashrc文件中。关于如何以其他方式编辑环境变量，请参阅操作系统文档。

10.1.4 包

为组织模块，可将其编组为包（package）。

要将模块加入包中，只需将模块文件放在包目录中即可。你还可以在包中嵌套其他包。例如，要创建一个名为drawing的包，其中包含模块shapes和colors，需要创建如下所示的文件和目录（UNIX路径名）。

文件/目录	描述
~/python/	PYTHONPATH中的目录
~/python/drawing/	包目录（包drawing）
~/python/drawing/__init__.py	包代码（模块drawing）
~/python/drawing/colors.py	模块colors
~/python/drawing/shapes.py	模块shapes

10.2 探索模块

介绍一些标准库模块前，先来说说如何探索模块。

10.2.1 模块包含什么

为此，首先需要将模块导入。假设你听说有一个名为copy的标准模块。

第一步：

>>> import copy

第二步：使用dir

>>> [n for n in dir(copy) if not n.startswith('_')]

['Error', 'PyStringMap', 'copy', 'deepcopy', 'dispatch_table', 'error', 'name', 't', 'weakref']

结果包含dir(copy)返回的不以下划线打头的名称，这比完整清单要好懂些。

第三步：变量__all__

>>> copy.__all__

['Error', 'copy', 'deepcopy']

看到啦这个列表包含的内容。

10.2.2 使用 help 获取帮助

>>> help(copy.copy)

Help on function copy in module copy:

copy(x)

Shallow copy operation on arbitrary Python objects.

See the module's __doc__ string for more info.

读取函数copy的信息。

前面的帮助信息是从函数copy的文档字符串中提取的：

>>> print(copy.copy.__doc__)

Shallow copy operation on arbitrary Python objects.

See the module's __doc__ string for more info.

10.2.3 使用源代码

一种办法是像解释器那样通过sys.path来查找，但更快捷的方式是查看模块的特性__file__。

>>> print(copy.__file__)

C:\Python35\lib\copy.py

10.3 标准库：一些深受欢迎的模块

10.3.1 sys

模块sys让你能够访问与Python解释器紧密相关的变量和函数。

函数/变量	描述
argv	命令行参数，包括脚本名
exit([arg])	退出当前程序，可通过可选参数指定返回值或错误消息
modules	一个字典，将模块名映射到加载的模块
path	一个列表，包含要在其中查找模块的目录的名称
platform	一个平台标识符，如sunos5或win32
stdin	标准输入流——一个类似于文件的对象
stdout	标准输出流——一个类似于文件的对象
stderr	标准错误流——一个类似于文件的对象

例：反转并打印命令行参数

# reverseargs.py 
import sys 
args = sys.argv[1:] 
args.reverse() 
print(' '.join(args))

下面是另一种解决方案：

print(' '.join(reversed(sys.argv[1:])))

输出：

$ python reverseargs.py this is a test

test a is this

10.3.2 os

模块os让你能够访问多个操作系统服务。它包含的内容很多，表10-3只描述了其中几个最有用的函数和变量。

函数/变量	描述
environ	包含环境变量的映射
system(command)	在子shell中执行操作系统命令
sep	路径中使用的分隔符
pathsep	分隔不同路径的分隔符
linesep	行分隔符（'\n'、'\r'或'\r\n'）
urandom(n)	返回n个字节的强加密随机数据

例如，看看启动Web浏览器的问题。命令system可用于执行任何外部程序，这在UNIX等环境中很有用，因为你可从命令行执行程序（或命令）来列出目录的内容、发送电子邮件等。它还可用于启动图形用户界面程序，如Web浏览器。在UNIX中，可像下面这样做（这里定/usr/bin/firefox处有浏览器）：

os.system('/usr/bin/firefox')

在Windows中，可以这样做（同样，这里指定的是你安装浏览器的路径）：

os.system(r'C:\"Program Files (x86)"\"Mozilla Firefox"\firefox.exe')

10.3.3 fileinput

函数	描述
input([files[, inplace[, backup]]])	帮助迭代多个输入流中的行
filename()	返回当前文件的名称
lineno()	返回（累计的）当前行号
filelineno()	返回在当前文件中的行号
isfirstline()	检查当前行是否是文件中的第一行
isstdin()	检查最后一行是否来自sys.stdin
nextfile()	关闭当前文件并移到下一个文件
close()	关闭序列

例子：在Python脚本中添加行号

# numberlines.py 
import fileinput 
for line in fileinput.input(inplace=True): 
    line = line.rstrip() 
    num = fileinput.lineno() 
    print('{:<50} # {:2d}'.format(line, num))

结果：添加行号后的行号添加程序

# numberlines.py # 1

# 2

import fileinput # 3

# 4

for line in fileinput.input(inplace=True): # 5

line = line.rstrip() # 6

num = fileinput.lineno() # 7

print('{:<50} # {:2d}'.format(line, num)) # 8

10.3.4 集合、堆和双端队列

1. 集合

在较新的版本中，集合是由内置类set实现的，这意味着你可直接创建集合，而无需导入模块sets。

还可执行各种标准集合操作（你可能在数学课上学过），如并集和交集，为此可使用对整数执行按位操作的运算符。

2. 堆

另一种著名的数据结构是堆（heap），它是一种优先队列。实际上，Python没有独立的堆类型，而只有一个包含一些堆操作函数的模块。这个模块名为heapq（其中的q表示队列），它包含6个函数（如表10-5所示），其中前4个与堆操作直接相关。必须使用列表来表示堆对象本身。

函数	描述
heappush(heap, x)	将x压入堆中
heappop(heap)	从堆中弹出最小的元素
heapify(heap)	让列表具备堆特征
heapreplace(heap, x)	弹出最小的元素，并将x压入堆中
nlargest(n, iter)	返回iter中n个最大的元素
nsmallest(n, iter)	返回iter中n个最小的元素

10.3.5 time

Python日期元组中的字段

索引	字段	值
0	年	如2000、2001等
1	月	范围1~12
2	日	范围1~31
3	时	范围0~23
4	分	范围0~59
5	秒	范围0~61
6	星期	范围0~6，其中0表示星期一
7	儒略日	范围1~366
8	夏令时	0、1或-1

模块time中一些重要的函数

函数	描述
asctime([tuple])	将时间元组转换为字符串
localtime([secs])	将秒数转换为表示当地时间的日期元组
mktime(tuple)	将时间元组转换为当地时间
sleep(secs)	休眠（什么都不做）secs秒
strptime(string[, format])	将字符串转换为时间元组
time()	当前时间（从新纪元开始后的秒数，以UTC为准）

函数time.asctime将当前时间转换为字符串，如下所示：

>>> time.asctime()

'Mon Jul 18 14:06:07 2016'

10.3.6 random

函数	描述
random()	返回一个0~1（含）的随机实数
getrandbits(n)	以长整数方式返回n个随机的二进制位
uniform(a, b)	返回一个a~b（含）的随机实数
randrange([start], stop, [step])	从range(start, stop, step)中随机地选择一个数
choice(seq)	从序列seq中随机地选择一个元素
shuffle(seq[, random])	就地打乱序列seq
sample(seq, n)	从序列seq中随机地选择n个值不同的元素

例子1：

from random import * 
from time import * 
date1 = (2016, 1, 1, 0, 0, 0, -1, -1, -1) 
time1 = mktime(date1) 
date2 = (2017, 1, 1, 0, 0, 0, -1, -1, -1) 
time2 = mktime(date2)

>>> random_time = uniform(time1, time2)

>>> print(asctime(localtime(random_time)))

Tue Aug 16 10:11:04 2016

例子2:

from random import randrange 
num = int(input('How many dice? ')) 
sides = int(input('How many sides per die? ')) 
sum = 0 
for i in range(num): sum += randrange(sides) + 1 
print('The result is', sum)

How many dice? 3

How many sides per die? 6

The result is 10

10.3.7 shelve 和 json

需要的是简单的存储方案，模块shelve可替你完成大部分工作——你只需提供一个文件名即可。

1. 一个潜在的陷阱

>>> import shelve

>>> s = shelve.open('test.dat')

>>> s['x'] = ['a', 'b', 'c']

>>> s['x'].append('d')

>>> s['x']

['a', 'b', 'c']

少了d。

要正确地修改使用模块shelve存储的对象，必须将获取的副本赋给一个临时变量，并在修改这个副本后再次存储：

>>> temp = s['x']

>>> temp.append('d')

>>> s['x'] = temp

>>> s['x']

['a', 'b', 'c', 'd']

2. 一个简单的数据库示例

# database.py 
import sys, shelve 
def store_person(db): 
    """ 
    让用户输入数据并将其存储到shelf对象中
    """ 
    pid = input('Enter unique ID number: ') 
    person = {} 
    person['name'] = input('Enter name: ') 
    person['age'] = input('Enter age: ') 
    person['phone'] = input('Enter phone number: ') 
    db[pid] = person 
def lookup_person(db): 
    """ 
    让用户输入ID和所需的字段，并从shelf对象中获取相应的数据
    """ 
    pid = input('Enter ID number: ') 
    field = input('What would you like to know? (name, age, phone) ') 
    field = field.strip().lower() 
    print(field.capitalize() + ':', db[pid][field]) 
def print_help(): 
    print('The available commands are:') 
    print('store : Stores information about a person') 
    print('lookup : Looks up a person from ID number') 
    print('quit : Save changes and exit') 
    print('? : Prints this message') 
def enter_command(): 
    cmd = input('Enter command (? for help): ') 
    cmd = cmd.strip().lower() 
    return cmd 
def main(): 
    database = shelve.open('C:\\database.dat') # 你可能想修改这个名称
    try: 
        while True: 
            cmd = enter_command() 
            if cmd == 'store': 
                store_person(database) 
            elif cmd == 'lookup': 
                lookup_person(database) 
            elif cmd == '?': 
                print_help() 
            elif cmd == 'quit': 
                return
            finally: 
                database.close()
if name == '__main__': main()

下面是一个示例交互过程：

Enter command (? for help): ?

The available commands are:

store : Stores information about a person

lookup : Looks up a person from ID number

quit : Save changes and exit

? : Prints this message

Enter command (? for help): store

Enter unique ID number: 001

Enter name: Mr. Gumby

Enter age: 42

Enter phone number: 555-1234

Enter command (? for help): lookup

Enter ID number: 001

What would you like to know? (name, age, phone) phone

Phone: 555-1234

Enter command (? for help): quit

退出这个程序后，来看看再次运行它时（这也许是在第二天）发生的情况。

Enter command (? for help): lookup

Enter ID number: 001

What would you like to know? (name, age, phone) name

Name: Mr. Gumby

Enter command (? for help): quit

如你所见，这个程序读取前面运行它时创建的文件。

10.3.8 re

1. 正则表达式是什么

正则表达式是可匹配文本片段的模式。最简单的正则表达式为普通字符串，与它自己匹配。换而言之，正则表达式'python'与字符串'python'匹配。

·通配符

正则表达式可与多个字符串匹配，你可使用特殊字符来创建这种正则表达式。因此被称为通配符（wildcard）。

·对特殊字符进行转义

普通字符只与自己匹配，但特殊字符的情况完全不同。要让特殊字符的行为与普通字符一样，可对其进行转义：像第1章对字符串中的引号进行转义时所做的那样，在它前面加上一个反斜杠。

·字符集

匹配任何字符很有用，但有时你需要更细致地控制。为此，可以用方括号将一个子串括起，创建一个所谓的字符集。

·二选一和子模式

需要以不同的方式处理每个字符时，字符集很好，但如果只想匹配字符串'python'和'perl'，该如何办呢？使用字符集或通配符无法指定这样的模式，而必须使用表示二选一的特殊字符：管道字符（|）。所需的模式为'python|perl'。

请注意，单个字符也可称为子模式。

·可选模式和重复模式

通过在子模式后面加上问号，可将其指定为可选的，即可包含可不包含。

r'(http://)?(www\.)?python\.org'

·字符串的开头和末尾

2. 模块re的内容

函数	描述
compile(pattern[, flags])	根据包含正则表达式的字符串创建模式对象
search(pattern, string[, flags])	在字符串中查找模式
match(pattern, string[, flags])	在字符串开头匹配模式
split(pattern, string[, maxsplit=0])	根据模式来分割字符串
findall(pattern, string)	返回一个列表，其中包含字符串中所有与模式匹配的子串s
ub(pat, repl, string[, count=0])	将字符串中与模式pat匹配的子串都替换为repl
escape(string)	对字符串中所有的正则表达式特殊字符都进行转义

函数re.compile将用字符串表示的正则表达式转换为模式对象，以提高匹配效率。调用search、match等函数时，如果提供的是用字符串表示的正则表达式，都必须在内部将它们转换为模式对象。

if re.search(pat, string): 
    print('Found it!')

函数re.split根据与模式匹配的子串来分割字符串。这类似于字符串方法split，但使用正则表达式来指定分隔符，而不是指定固定的分隔符。例如，使用字符串方法split时，可以字符串', '为分隔符来分割字符串，但使用re. split时，可以空格和逗号为分隔符来分割字符串。

>>> some_text = 'alpha, beta,,,,gamma delta'

>>> re.split('[, ]+', some_text)

['alpha', 'beta', 'gamma', 'delta']

3. 匹配对象和编组

在模块re中，查找与模式匹配的子串的函数都在找到时返回MatchObject对象。这种对象包含与模式匹配的子串的信息，还包含模式的哪部分与子串的哪部分匹配的信息。这些子串部分称为编组（group）。

在下面的模式中：

'There (was a (wee) (cooper)) who (lived in Fyfe)'

包含如下编组：

0 There was a wee cooper who lived in Fyfe

1 was a wee cooper

2 wee

3 cooper

4 lived in Fyfe

re匹配对象的重要方法

方法	描述
group([group1, ...])	获取与给定子模式（编组）匹配的子串
start([group])	返回与给定编组匹配的子串的起始位置
end([group])	返回与给定编组匹配的子串的终止位置（与切片一样，不包含终止位置）
span([group])	返回与给定编组匹配的子串的起始和终止位置