AUTOMATE THE BORING STUFF WITH PYTHON读书笔记 - 第9章：READING AND WRITING FILES

本文链接：https://blog.csdn.net/stevensxiao/article/details/104117823

本章需要用到以下的模块：

from pathlib import Path
import os

文件和文件路径

文件由文件名和路径组成。Linux以/（forward slash）为根路径，Windows以\（backslash）为根路径。

>>> Path('usr', 'include', 'stdio.h')
PosixPath('usr/include/stdio.h')
>>> str(Path('usr', 'include', 'stdio.h'))
'usr/include/stdio.h'
>>> str(Path('/usr/include', 'stdio.h'))
'/usr/include/stdio.h'

和字符串使用+类似，Path对象也可以用/拼接:

>>> Path('/usr/include') / Path('stdio.h')
PosixPath('/usr/include/stdio.h')
>>> Path('/usr/include') / 'stdio.h'
PosixPath('/usr/include/stdio.h')

使用Path而非字符串拼接的好处在于其可以处理操作系统中路径分隔符\和/的问题，因此代码更安全。

当前路径和切换路径使用cwd()和chdir()，用户根目录使用home():

>>> os.chdir('/usr/include')
>>> Path.cwd()
PosixPath('/usr/include')
>>> Path.home()
PosixPath('/home/user01')

创建目录有两种方法，第一种是os.makedirs(),类似于命令mkdir -p; 第二种是Path().mkdir，类似于命令mkdir。
绝对路径与相对路径概念。
Path对象的is_absolute()方法可判断是否为绝对路径，abspath()方法可返回绝对路径。

>>> os.chdir(Path.home())
>>> os.path.abspath('..')
'/home'
>>> Path.cwd().is_absolute()
True
>>> os.path.isabs(Path.cwd())
True

relpath()方法可计算相对路径，第二个参数是起点，第一个是终点：

>>> os.path.relpath('/usr/include', '/tmp')
'../usr/include'

返回文件路径的一部分：

>>> p = Path('/usr/include/stdio.h')
>>> p.anchor
'/'
>>> p.name
'stdio.h'
>>> p.stem
'stdio'
>>> p.suffix
'.h'
>>> p.drive
''
>>> p.parent
PosixPath('/usr/include')
>>> p.parent.name
'include'
# 返回一系列父目录，知道根目录
>>> p.parents
<PosixPath.parents>
>>> p.parents[0]
PosixPath('/usr/include')
>>> p.parents[1]
PosixPath('/usr')
>>> p.parents[2]
PosixPath('/')
>>> p.parents[3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python3.6/pathlib.py", line 594, in __getitem__
    raise IndexError(idx)
IndexError: 3

意外发现，python交互式shell还支持自动完成，按<Tab>键即可。

>>> p = Path('/usr/include/stdio.h')
>>> p.
p.absolute(         p.exists(           p.is_char_device(   p.iterdir(          p.open(             p.relative_to(      p.samefile(         p.unlink(
p.anchor            p.expanduser(       p.is_dir(           p.joinpath(         p.owner(            p.rename(           p.stat(             p.with_name(
p.as_posix(         p.glob(             p.is_fifo(          p.lchmod(           p.parent            p.replace(          p.stem              p.with_suffix(
p.as_uri(           p.group(            p.is_file(          p.lstat(            p.parents           p.resolve(          p.suffix            p.write_bytes(
p.chmod(            p.home(             p.is_reserved(      p.match(            p.parts             p.rglob(            p.suffixes          p.write_text(
p.cwd(              p.is_absolute(      p.is_socket(        p.mkdir(            p.read_bytes(       p.rmdir(            p.symlink_to(       
p.drive             p.is_block_device(  p.is_symlink(       p.name              p.read_text(        p.root              p.touch(

获取路径的目录名和文件名：

>>> os.path.basename('/usr/include/stdio.h')
'stdio.h'
>>> os.path.dirname('/usr/include/stdio.h')
'/usr/include'
>>> os.path.dirname(Path('/usr/include/stdio.h'))
'/usr/include'
## 同时获取两者
>>> os.path.split('/usr/include/stdio.h')
('/usr/include', 'stdio.h')

文件大小和目录内容：

>>> os.path.getsize('/usr/include/stdio.h')
31641
>>> os.listdir('.')
['.mozilla', '.bash_logout', '.bashrc', '.cache', '.dbus', '.config', '.ICEauthority', '.local', '.esd_auth', 'Desktop', 'Downloads', 'Templates', 'Public', 'Documents', 'Music', 'Pictures', 'Videos', '.gnupg', '.ssh', '.bash_history', '.pki', 'python_works', '.python_history', '.bash_profile', '.lesshst', '.vimrc', '.vboxclient-clipboard.pid', '.vboxclient-display.pid', '.vboxclient-seamless.pid', '.vboxclient-draganddrop.pid', '.viminfo', 'pcc_2e', '.gitconfig', '.netrc', 'Automate_the_Boring_Stuff_2e_onlinematerials.zip', 'automate_online-materials', 'customValidate.py', '.idlerc']

glob是globbing的简写，表示文件名扩展，文件名代换的意思，平时在shell中常用到，例如ls *.txt。
glob模式详见帮助man n glob，常用的为：

*表示所有
?表示单个字符
[chars]表示匹配任意单个字符
{a,b,...}表示匹配任意子模式

>>> p = Path('./automate_online-materials')
>>> for f in p.glob('*.txt'):
...     print(f)
... 
automate_online-materials/dictionary.txt
automate_online-materials/guests.txt
automate_online-materials/automate-linux-requirements.txt
automate_online-materials/automate-mac-requirements.txt
automate_online-materials/automate-win-requirements.txt

假设p为Path对象，假设p是Path对象，检查文件的有效性可以用p.exists(), p.is_file(), p.is_dir()

文件读写流程

文件分为平文本文件和二进制文件。
平文本文件的读写使用p.read_text()和p.write_text()。
假设f是File对象，另一种通用的方法是f.open(), f.read()和f.write()。
read()方法将整个文件读取成一个字符串。readlines()将整个文件读取成一个list，每行为一个元素。

$ cat test.txt
row 1
row 2
row 3
row 4

>>> from pathlib import Path
>>> f = open('test.txt')
>>> f.read()
'row 1\nrow 2\nrow 3\nrow 4\n'
>>> f.readlines()
[]
>>> f.close()
>>> f = open('test.txt')
>>> f.readlines()
['row 1\n', 'row 2\n', 'row 3\n', 'row 4\n']
>>> f.close()

打开文件可指定不同的模式，如只读(r)，读写(+)，追加(a)等。可以以二进制模式(b)或文本模式(t)打开。
可以用fseek指定写的位置。

通过SHELVE模块保存变量

使用shelve模块，可以将变量以二进制方式存入文件。实际上就是把文件当数据库用。
shelve文件类似于字典：

>>> import shelve
>>> shelfFile = shelve.open('mydata')
>>> cats = ['Zophie', 'Pooka', 'Simon']
>>> shelfFile['cats'] = cats
>>> shelfFile.close()
>>> 
>>> shelfFile = shelve.open('mydata')
>>> type(shelfFile)
<class 'shelve.DbfilenameShelf'>
>>> shelfFile['cats']
['Zophie', 'Pooka', 'Simon']
>>> shelfFile.close()

在看我自己的一个例子：

>>> import shelve
>>> shelfFile = shelve.open('mydata')
>>> shelfFile['name'] = 'steven'
>>> shelfFile['age'] = 48
>>> shelfFile['friends'] = ['bob', 'grace', 'flora']
>>> shelfFile.close()
>>> 
>>> shelfFile = shelve.open('mydata')
>>> shelfFile['friends']
['bob', 'grace', 'flora']
>>> shelfFile['name']
'steven'
>>> shelfFile['age']
48
>>>
>>> shelfFile = shelve.open('mydata')
>>> shelfFile.keys()
KeysView(<shelve.DbfilenameShelf object at 0x7f544baf06a0>)
>>> list(shelfFile.keys())
['age', 'friends', 'name']
>>> list(shelfFile.values())
[48, ['bob', 'grace', 'flora'], 'steven']
>>> list(shelfFile.items())
[('age', 48), ('friends', ['bob', 'grace', 'flora']), ('name', 'steven')]
>>> shelfFile.close()

# 以下演示文件更新
>>> shelfFile = shelve.open('mydata')
>>> shelfFile['friends']
['bob', 'grace', 'flora']
>>> shelfFile['friends'] = shelfFile.get('friends') + ['john']
>>> shelfFile.close()
>>> shelfFile = shelve.open('mydata')
>>> shelfFile['friends']
['bob', 'grace', 'flora', 'john']

通过PPRINT.PFORMAT()函数保存变量

前面已经介绍过pprint.pprint()，而pprint.pformat()可以返回字符串而非打印。
利用pprint.pformat()可以生成可以import的文件，也就是python可以识别的格式，见下例：

>>> import pprint
>>> cats = [{'name': 'Zophie', 'desc': 'chubby'}, {'name': 'Pooka', 'desc': 'fluffy'}]
>>> print(cats)
[{'name': 'Zophie', 'desc': 'chubby'}, {'name': 'Pooka', 'desc': 'fluffy'}]
>>> pprint.pprint(cats)
[{'desc': 'chubby', 'name': 'Zophie'}, {'desc': 'fluffy', 'name': 'Pooka'}]
>>> fileObj = open('myCats.py', 'w')
>>> fileObj.write('cats = ' + pprint.pformat(cats) + '\n')
83
>>> fileObj.close()
>>> 
>>> import myCats
>>> myCats.cats
[{'desc': 'chubby', 'name': 'Zophie'}, {'desc': 'fluffy', 'name': 'Pooka'}]

$ cat myCats.py
cats = [{'desc': 'chubby', 'name': 'Zophie'}, {'desc': 'fluffy', 'name': 'Pooka'}]

不过还是优先推荐shelve，因为只有基础数据类型可以用字符串编码，而shelve可以处理其它数据类型。
可以import倒是启发了我，这有点类似于C语言中的include，这样一些变量初始化就可以放到这个文本文件中。
? 不过我试了一下，文件对象也是可以写和后续import的。此处存疑!

项目: 生成随机问题答卷文件

最关键的部分在以下示例中：

>>> import random
>>> a=[1,2,3,4]
>>> random.shuffle(a)
>>> a
[4, 1, 3, 2]
>>> random.sample(a, 2)
[4, 1]
>>> random.sample(a, 2)
[1, 3]
>>> random.sample(a, 2)
[4, 3]

这是一个非常实用的例子，班上35学生，为每一个学生生成一份不同的试卷，问题都是一样的，但选择的答案和题目的顺序是不一样的。
问题的顺序不同是通过random.shuffle()实现的。选择中有一个正确答案，另外三个是从剩余的选项中通过random.sample(list, 3)实现的。
然后这4个选择再通过random.shuffle打乱。