python3 cookbook 笔记十一

最新推荐文章于 2022-12-13 16:57:03 发布

soul booster

最新推荐文章于 2022-12-13 16:57:03 发布

阅读量126

点赞数

分类专栏：学习文章标签： python

本文链接：https://blog.csdn.net/qq_41967784/article/details/106468610

版权

学习专栏收录该内容

37 篇文章 0 订阅

订阅专栏

第五章文件与IO

读写文本数据
打印输出至文件中
使用其他分隔符或行终止符打印
读写字节数据
文件不存在才能写入
读写压缩文件
固定大小记录的文件迭代
获取文件夹中的文件列表

读写文本数据

注意两种模式test mode(t)和binary mode(b)，和四种常用的操作读（r）写（w）新建文件再写入（x）追加写入（a），组合起来就类似rt 和 wb这样，不过默认是text mode所以一般可以省略掉 t。
第二点就是最好使用with语句控制上下文，比较美观而且不用手动关闭文件，比较pythonic：

with open('a', 'w') as a, open('b', 'w') as b:
    do_something()

第三点是关于open()函数的一个可选关键字参数encoding，默认值是系统编码platform dependent，书上说可以通过调用sys.getdefaultcoding()来得到，然而并不对，以下代码运行在windows平台：

>>> import sys
>>> sys.getdefaultencoding()
'utf-8'
>>> with open('format.txt', 'r') as f:
...     print(f.read())
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
UnicodeDecodeError: 'gbk' codec can't decode byte 0xb3 in position 15: illegal m
ultibyte sequence
>>>

可以看到通过sys.getdefaultcoding()得到的编码方式是utf-8，但当我以默认方式读取一个文本（编码格式也是utf-8）时报错了，说明这时候open()函数用的编码方式不是utf-8而是gbk。所以这个utf-8并不是平台默认的编码方式，通过知乎上这篇文章我了解到，sys.getdefaultcoding()得到的编码可以理解成python3编译器本身默认的编码（当然在python2里面是ascii），得到平台默认编码需要用到locale模块的locale.getdefaultlocale()函数：
6.11更新：上面的文章也不对，看了文档发现是用 locale.getpreferredencoding(False)函数，难受住

>>> import locale
>>> locale.getdefaultlocale()
('zh_CN', 'cp936')
>>>

这里cp936就是GBK是对国标（GB2312）的扩展，所以默认编码方式是GBK。
最后一个点是关于换行符的识别问题，在Linux和Windows中是不一样的（分别是\n和\r\n）。默认情况下，Python会以统一模式处理换行符。这种模式下（默认），在读取文本的时候，Python可以识别所有的普通换行符并将其转换为单个 \n 字符。类似的，在输出时会将换行符 \n 转换为系统默认的换行符。如果你不希望这种默认的处理方式，可以给 open() 函数传入参数 newline='' ，就像下面这样：

>>> # Newline translation enabled (the default)
>>> f = open('hello.txt', 'rt')
>>> f.read()
'hello world!\n'

>>> # Newline translation disabled
>>> g = open('hello.txt', 'rt', newline='')
>>> g.read()
'hello world!\r\n'
>>>

另外在SO上找到这个答案，好像可以更改参数，留个印象，还是宁愿写上encoding='utf-8'，不知道以后会不会回来真香：

import locale
locale.setlocale(locale.LC_ALL, 'en_US.utf-8')

打印输出至文件中

with open('d:/work/test.txt', 'wt') as f:
    print('Hello World!', file=f)

关于输出重定向到文件中就这些了。但是有一点要注意的就是文件必须是以文本模式打开。如果文件是二进制模式的话，打印就会出错。（当然也可以用sys.stdout指向文件对象，再直接print）

使用其他分隔符或行终止符打印

可以使用在 print() 函数中使用 sep 和 end 关键字参数，以你想要的方式输出。比如：

>>> print('ACME', 50, 91.5)
ACME 50 91.5
>>> print('ACME', 50, 91.5, sep=',')
ACME,50,91.5
>>> print('ACME', 50, 91.5, sep=',', end='!!\n')
ACME,50,91.5!!
>>>

通常用逗号作分隔符来输出数据的时候，会用到','.join()来完成，但问题在于它仅仅适用于字符串。这意味着你通常需要执行另外一些转换才能让它正常工作。比如：

>>> row = ('ACME', 50, 91.5)
>>> print(','.join(row))
Traceback (most recent call last):
    File "<stdin>", line 1, in <module>
TypeError: sequence item 1: expected str instance, int found
>>> print(','.join(str(x) for x in row))
ACME,50,91.5
>>>

你当然可以不用那么麻烦，只需要像下面这样写：

>>> print(*row, sep=',')
ACME,50,91.5
>>>

读写字节数据

with open('somefile.bin', 'rb') as f:
    data = f.read(16)
    text = data.decode('utf-8')

with open('somefile.bin', 'wb') as f:
    text = 'Hello World'
    f.write(text.encode('utf-8'))

文件不存在才能写入

>>> with open('somefile', 'wt') as f:
...     f.write('Hello\n')
...
>>> with open('somefile', 'xt') as f:
...     f.write('Hello\n')
...
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
FileExistsError: [Errno 17] File exists: 'somefile'
>>>

要注意的是x模式是一个Python3对 open() 函数特有的扩展。在Python的旧版本或者是Python实现的底层C函数库中都是没有这个模式的。

读写压缩文件

# gzip compression
import gzip
with gzip.open('somefile.gz', 'rt') as f:
    text = f.read()

# bz2 compression
import bz2
with bz2.open('somefile.bz2', 'rt') as f:
    text = f.read()

固定大小记录的文件迭代

from functools import partial

RECORD_SIZE = 32

with open('somefile.data', 'rb') as f:
    records = iter(partial(f.read, RECORD_SIZE), b'')
    for r in records:
        ...

获取文件夹中的文件列表

import os

# Get all regular files
names = [name for name in os.listdir('.')
        if os.path.isfile(os.path.join('.', name))]

# Get all dirs
dirnames = [name for name in os.listdir('.')
        if os.path.isdir(os.path.join('.', name))]

字符串的 startswith()和 endswith()方法对于过滤一个目录的内容也是很有用的。比如：

pyfiles = [name for name in os.listdir('somedir')
            if name.endswith('.py')]

对于文件名的匹配，你可能会考虑使用glob 或 fnmatch 模块。比如：

import glob
pyfiles = glob.glob('somedir/*.py')

from fnmatch import fnmatch
pyfiles = [name for name in os.listdir('somedir')
            if fnmatch(name, '*.py')]