【python 让繁琐工作自动化】第9章读写文件 (2)

最新推荐文章于 2024-07-30 10:43:24 发布

今岁成蹊

最新推荐文章于 2024-07-30 10:43:24 发布

阅读量701

点赞数 1

分类专栏： Python学习笔记文章标签： python

本文链接：https://blog.csdn.net/HPP_CSDN/article/details/108337439

版权

Python学习笔记专栏收录该内容

19 篇文章 20 订阅

订阅专栏

Automate the Boring Stuff with Python: Practical Programming for Total Beginners (2nd Edition)
Copyright © 2020 by Al Sweigart.

9.2 文件读写过程

纯文本文件（plaintext files）：只包含基本的文本字符，不包括字体、大小或颜色等信息。
它们可以被 Windows 的 Notepad 或 OS X 的 TextEdit 应用打开。程序可以轻易地读取纯文本文件的内容，并将它们作为普通的字符串值。
二进制文本（binary files）：是所有其他类型文件，比如字处理文档、PDF、图片、电子表格和可执行程序等。如果使用 Notepad 或 TextEdit 打开二进制文件，它看起来是乱码。

pathlib 模块的 read_text() 方法返回一个包含文本文件全部内容的字符串。它的 write_text() 方法使用传递给它的字符串创建一个新的文本文件（或覆盖一个现有的文本文件）。

>>> from pathlib import Path
>>> p = Path('spam.txt')
>>> p.write_text('Hello, world!')
13
>>> p.read_text()
'Hello, world!'

请记住，这些 Path 对象方法只提供与文件的基本交互。
更常见的文件写入方法包括使用 open() 函数和文件对象。

在 Python 中，读写文件有 3 个步骤：
① 调用 open() 函数，返回一个 File 对象。
② 调用 File 对象的 read() 或 write() 方法。
③ 调用 File 对象的 close() 方法关闭文件。

使用 open() 函数打开文件

使用 open() 函数打开文件，传入一个字符串，说明想要打开的文件，可以使用绝对路径或者相对路径。open() 函数返回一个 File 对象。
例如，用户文件夹内，创建一个文本文件 hello.txt，文件内容是 Hello world!。如果使用 Windows 系统，

>>> helloFile = open('C:\\Users\\your_home_folder\\hello.txt')

如果使用的是 OS X 系统，

>>> helloFile = open('/Users/your_home_folder/hello.txt')

这些命令以“读取纯文本”模式（简称读模式：read mode）打开文件。当使用读模式打开文件时，Python 只能从文件中读取数据。读模式是 Python 打开文件的默认模式。
如果想显式地指定读模式，可以传入字符串值 'r'，作为 open() 函数的第二个参数。
所以，open('/Users/asweigart/hello.txt', 'r') 等同于 open('/Users/asweigart/hello.txt')。

读取文件的内容

使用 File 对象的 read() 方法读取文件内容，它返回一个字符串，包含保存在文件的内容。

>>> helloContent = helloFile.read()
>>> helloContent
'Hello world!'

使用 readlines() 方法，从文件中获取一个字符串列表，每个字符串是文本中每一行。
创建一个文件 sonnet29.txt，其内容如下：

When, in disgrace with fortune and men's eyes,
I all alone beweep my outcast state,
And trouble deaf heaven with my bootless cries,
And look upon myself and curse my fate,

>>> sonnetFile = open('sonnet29.txt')
>>> sonnetFile.readlines()
["When, in disgrace with fortune and men's eyes,\n", 'I all alone beweep my outcast state,\n', 'And trouble deaf heaven with my bootless cries,\n', 'And look upon myself and curse my fate,']

写入文件

如果需要写入文件，那么以“写入纯文本”模式（简称写模式：write mode）或者“添加纯文本”模式（简称添加模式：append mode）打开文件。
写模式：覆盖现有文件并从头开始，就像用新值覆盖变量值一样。传递 'w' 作为 open() 的第二个参数，以写模式打开文件。
添加模式：将文本添加到现有文件的末尾。类似于在列表变量中添加内容，而不是完全覆盖该变量。传递 'a' 作为 open() 的第二个参数，以添加模式打开文件。

如果传入 open() 的文件不存在，写模式和添加模式都会创建一个新的空文件。
在下一次打开文件之前，调用 close() 方法。

>>> baconFile = open('bacon.txt', 'w')
>>> baconFile.write('Hello world!\n')
13
>>> baconFile.close()
>>> baconFile = open('bacon.txt', 'a')
>>> baconFile.write('Bacon is not a vegetable.')
25
>>> baconFile.close()
>>> baconFile = open('bacon.txt')
>>> content = baconFile.read()
>>> baconFile.close()
>>> print(content)
Hello world!
Bacon is not a vegetable.

注意，write() 方法不会在字符串的末尾自动添加换行符。

9.3 使用 shelve 模块保存变量

可以使用 shelve 模块将 Python 程序中的变量保存到二进制的 shelf 文件中。这样，程序可以从硬盘驱动器将数据恢复到变量。shelve 模块允许向程序添加“保存”和“打开”功能。例如，如果运行一个程序，并输入一些配置设置，可以将这些设置保存到一个 shelf 文件中，然后让程序在下一次运行时加载它们。

>>> import shelve
>>> shelfFile = shelve.open('mydata')
>>> cats = ['Zophie', 'Pooka', 'Simon']
>>> shelfFile['cats'] = cats 	# 改变 shelf 值，与字典的操作类似
>>> shelfFile.close()

在 Windows 运行上面的代码，当前工作目录中出现 3 个新文件：mydata.bak，mydata.dat，和 mydata.dir。在 OS X 上，只创建一个文件 mydata.db。
这些二进制文件包含存储在 shelf 的数据。该模块让你不用操心如何将程序的数据存储到文件中。

程序可以使用 shelve 模块，重新打开这些 shelf 文件，并从中获取数据。shelf 值不需要以读模式或写模式打开，它们一旦打开，可以同时做这两个操作。

>>> shelfFile = shelve.open('mydata')
>>> type(shelfFile) 
<class 'shelve.DbfilenameShelf'>
>>> shelfFile['cats']
['Zophie', 'Pooka', 'Simon']
>>> shelfFile.close()

（我自己在 Windows 运行上面的代码，只创建了一个文件 mydata。type(shelfFile) 的结果为 <type ‘instance’>）

与字典类似，shelf 值有 keys() 和 values() 方法，返回 shelf 中键和值的类似列表值。因为这些方法返回类似列表值，而不是真正的列表，应该将它们传递给 list() 方法，以列表的形式取值。

>>> shelfFile = shelve.open('mydata')
>>> list(shelfFile.keys())
['cats']
>>> list(shelfFile.values())
[['Zophie', 'Pooka', 'Simon']]
>>> shelfFile.close()

创建文件时，如果需要在 Notepad 或 TextEdit 这样的文本编辑器中读取它们，纯文本就非常有用。但是，如果想要保存 Python 程序中的数据，那就使用 shelve 模块。

9.4 使用 pprint.pformat() 函数保存变量

假设有一个字典，存储在一个变量中，希望保存这个变量及其内容以便将来使用。使用 pprint.pformat() 将得到一个可以写入 .py 文件的字符串。这个字符串与 pprint.pprint() 函数打印的文本相同，其格式不仅便于阅读，而且在语法上也是正确的 Python 代码。这个 .py 文件作为自己的模块，可以在需要使用存储在其中的变量时导入它。

# myCats.py
import pprint
cats = [{'name': 'Zophie', 'desc': 'chubby'}, {'name': 'Pooka', 'desc': 'fluffy'}]
pprint.pformat(cats)
fileObj = open('myCats.py', 'w')
fileObj.write('cats = ' + pprint.pformat(cats) + '\n')
fileObj.close()

由于 Python 脚本本身也是带有 .py 文件扩展名的文本文件，所以 Python 程序甚至可以生成其他 Python 程序。然后可以将这些文件导入到脚本中。

>>> import myCats
>>> myCats.cats
[{'name': 'Zophie', 'desc': 'chubby'}, {'name': 'Pooka', 'desc': 'fluffy'}]
>>> myCats.cats[0]
{'name': 'Zophie', 'desc': 'chubby'}
>>> myCats.cats[0]['name']
'Zophie'

与使用 shelve 模块保存变量相比，创建 .py 文件的好处是，因为它是文本文件，任何人都可以使用简单的文本编辑器读取和修改文件的内容。但是，对于大多数应用程序，使用 shelve 模块保存数据，是将变量保存到文件的首选方法。只有基本数据类型（如整数、浮点数、字符串、列表和字典）可以作为简单文本写入文件。例如，File 对象不能被编码为文本。

9.5 项目：生成随机的测验试卷文件

假如你是一个地理老师，班里有 35 个学生，你想做一个关于美国各州首府的突击测验。你不能确保没人作弊，于是想把问题的顺序随机化，这样每个测验都是独一份的，任何人都不可能抄袭别人的答案。

程序需要做的事：
① 创建 35 份不同的测验试卷。
② 为每份测验试卷创建 50 道多项选择题，次序随机。
③ 为每个问题提供 1 个正确答案和 3 个随机的错误答案，次序随机。
④ 将测验试卷写到 35 个文本文件中。
⑤ 将答案写到 35 个文本文件中。

这意味着代码需要做以下事情：
① 将州和它们的首府保存在一个字典中。
② 针对测验文本文件和答案文本文件，调用 open()、write() 和 close()。
③ 使用 random.shuffle() 随机调整问题和多重选项的次序。

#! python3
# randomQuizGenerator.py - Creates quizzes with questions and answers in
# random order, along with the answer key.

import random

# The quiz data. Keys are states and values are their capitals.
capitals = {'Alabama': 'Montgomery', 'Alaska': 'Juneau', 'Arizona': 'Phoenix', 'Arkansas': 'Little Rock', 
'California': 'Sacramento', 'Colorado': 'Denver', 'Connecticut': 'Hartford', 
'Delaware': 'Dover', 
'Florida': 'Tallahassee', 
'Georgia': 'Atlanta', 
'Hawaii': 'Honolulu', 
'Idaho': 'Boise', 'Illinois': 'Springfield', 'Indiana': 'Indianapolis', 'Iowa': 'Des Moines', 
'Kansas': 'Topeka', 'Kentucky': 'Frankfort', 
'Louisiana': 'Baton Rouge', 
'Maine': 'Augusta', 'Maryland': 'Annapolis', 'Massachusetts': 'Boston', 'Michigan': 'Lansing', 'Minnesota': 'Saint Paul', 'Mississippi': 'Jackson', 'Missouri': 'Jefferson City', 'Montana': 'Helena', 
'Nebraska': 'Lincoln', 'Nevada': 'Carson City', 'New Hampshire': 'Concord',  'New Jersey': 'Trenton', 'New Mexico': 'Santa Fe', 'New York': 'Albany', 'North Carolina': 'Raleigh', 'North Dakota': 'Bismarck', 
'Ohio': 'Columbus', 'Oklahoma': 'Oklahoma City', 'Oregon': 'Salem', 
'Pennsylvania': 'Harrisburg', 
'Rhode Island': 'Providence', 
'South Carolina': 'Columbia', 'South Dakota': 'Pierre', 
'Tennessee': 'Nashville', 'Texas': 'Austin', 
'Utah': 'Salt Lake City', 
'Vermont': 'Montpelier', 'Virginia': 'Richmond', 
'Washington': 'Olympia', 'West Virginia': 'Charleston', 'Wisconsin': 'Madison', 'Wyoming': 'Cheyenne'
}

# Generate 35 quiz files.
for quizNum in range(35):
	# Create the quiz and answer key files.
	quizFile = open('capitalsquiz%s.txt' % (quizNum + 1), 'w')
	answerKeyFile = open('capitalsquiz_answers%s.txt' % (quizNum + 1), 'w')

	# Write out the header for the quiz.
	quizFile.write('Name:\n\nDate:\n\nPeriod:\n\n')
	quizFile.write((' ' * 20) + 'State Capitals Quiz (Form %s)' % (quizNum + 1))
	quizFile.write('\n\n')

	# Shuffle the order of the states.
	states = list(capitals.keys())
	random.shuffle(states) 	# 打乱列表 states 的次序
	
	# Loop through all 50 states, making a question for each.
	for questionNum in range(50):
		# Get right and wrong answers.
		correctAnswer = capitals[states[questionNum]]
		wrongAnswers = list(capitals.values())
		del wrongAnswers[wrongAnswers.index(correctAnswer)]
		wrongAnswers = random.sample(wrongAnswers, 3) # 从 wrongAnswers 列表中随机选取 3 个值
		answerOptions = wrongAnswers + [correctAnswer]
		random.shuffle(answerOptions) 	# 打乱列表 answerOptions 的次序
		
		# Write the question and the answer options to the quiz file.
		quizFile.write('%s. What is the capital of %s?\n' % (questionNum + 1, states[questionNum]))
		for i in range(4):
			quizFile.write(' %s. %s\n' % ('ABCD'[i], answerOptions[i]))
		quizFile.write('\n')

		# Write the answer key to a file.
		answerKeyFile.write('%s. %s\n' % (questionNum + 1, 'ABCD'[answerOptions.index(correctAnswer)]))
		
	quizFile.close()
	answerKeyFile.close()

运行程序后，生成的 capitalsquiz1.txt 文件看起来像这样：

Name:

Date:

Period:

                    State Capitals Quiz (Form 1)

1. What is the capital of West Virginia?
    A. Hartford
    B. Santa Fe
    C. Harrisburg
    D. Charleston

2. What is the capital of Colorado?
    A. Raleigh
    B. Harrisburg
    C. Denver
    D. Lincoln

--snip--

对应的 capitalsquiz_answers1.txt 文件看起来像这样：

1. D
2. C
3. A
--snip--

9.6 项目：多重剪贴板

任务：在网页或软件中使用多个文本字段填写许多表格。剪贴板上一次只有一个内容。如果有几段不同的文本需要拷贝粘贴，就不得不一次又一次地标记和拷贝几个同样的内容。
写一个 Python 程序跟踪多个文本片段。这个“多重剪贴板”命名为 mcb.pyw。.pyw 扩展名意味着，当程序运行时，Python 不会显示终端窗口。
该程序利用关键字保存剪贴板的每一块文本。例如，运行 py mcb.pyw save spam，剪贴板当前内容将会保存到关键字 spam。通过运行 py mcb.pyw spam，这段文本可以重新加载到剪贴板。如果用户忘记了关键字，可以运行 py mcb.pyw list，复制所有关键字的列表到剪贴板。

程序需要做的：
① 针对要检查的关键字，提供命令行参数。
② 如果参数是 save，将剪贴板的内容保存到关键字。
③ 如果参数是 list，将所有的关键字复制到剪贴板。
④ 否则，将关键字对应的文本复制到剪贴板。

这意味着代码需要做以下事情：
① 从 sys.argv 中读取命令行参数。
② 读写剪贴板。
③ 保存并加载 shelf 文件。

如果使用 Windows，可以创建一个名为 mcb.bat 的批处理文件，很容易地通过“Run…”窗口运行这个脚本。该批处理文件包含如下内容：

@pyw.exe C:\Python34\mcb.pyw %*

#! python3
# mcb.pyw - Saves and loads pieces of text to the clipboard.
# Usage: py.exe mcb.pyw save <keyword> - Saves clipboard to keyword.
#        py.exe mcb.pyw <keyword> - Loads keyword to clipboard.
#        py.exe mcb.pyw list - Loads all keywords to clipboard.

import shelve, pyperclip, sys

mcbShelf = shelve.open('mcb')

# Save clipboard content.
if len(sys.argv) == 3 and sys.argv[1].lower() == 'save':
	mcbShelf[sys.argv[2]] = pyperclip.paste()
elif len(sys.argv) == 2:
	# List keywords and load content.
	if sys.argv[1].lower() == 'list':
		pyperclip.copy(str(list(mcbShelf.keys())))
	elif sys.argv[1] in mcbShelf:
		pyperclip.copy(mcbShelf[sys.argv[1]])

mcbShelf.close()

9.7 习题

7．如果已有的文件以写模式打开，会发生什么？
解：当程序运行完 f = open('write.txt', 'w') 时，write.txt 文件的原来的所有内容清空，光标位于文件的开始位置。

9.8 实践项目

扩展多重剪贴板

扩展本章中的多重剪贴板程序，增加一个 delete <keyword> 命令行参数，它将从 shelf 中删除一个关键字。然后添加一个 delete 命令行参数，它将删除所有关键字。

#! python3
# mcb.pyw - Saves and loads pieces of text to the clipboard.
# Usage: py.exe mcb.pyw save <keyword> - Saves clipboard to keyword.
#        py.exe mcb.pyw <keyword> - Loads keyword to clipboard.
#        py.exe mcb.pyw list - Loads all keywords to clipboard.
#        py.exe mcb.pyw delete <keyword> - Delete keyword in shelf file.
#        py.exe mcb.pyw delete - Delete all keywords in shelf file.

import shelve, pyperclip, sys

mcbShelf = shelve.open('mcb')

if len(sys.argv) == 3:
	# Save clipboard content.
	if sys.argv[1].lower() == 'save':
		mcbShelf[sys.argv[2]] = pyperclip.paste()
		
	# Delete keyword in shelf file.
	elif sys.argv[1].lower() == 'delete':
		del mcbShelf[sys.argv[2]]
		
elif len(sys.argv) == 2:
	# List keywords and load content.
	if sys.argv[1].lower() == 'list':
		pyperclip.copy(str(list(mcbShelf.keys())))
	elif sys.argv[1] in mcbShelf:
		pyperclip.copy(mcbShelf[sys.argv[1]])
		
	# Delete all keywords in shelf file.
	elif sys.argv[1].lower() == 'delete':
		mcbShelf.clear()

mcbShelf.close()

疯狂填词（Mad Libs）

创建一个疯狂填词的程序，读取文本文件，允许用户在文本文件中出现 ADJECTIVE、NOUN、ADVERB 或 VERB 单词的任何地方添加自己的文本。例如，一个文本文件可能看起来像这样：

The ADJECTIVE panda walked to the NOUN and then VERB. A nearby NOUN was unaffected by these events.

程序将找到这些单词并提示用户替换它们。

Enter an adjective:
silly
Enter a noun:
chandelier
Enter a verb:
screamed
Enter a noun:
pickup truck

然后将创建以下文本文件：

The silly panda walked to the chandelier and then screamed. A nearby pickup truck was unaffected by these events.

结果应该打印到屏幕上并保存到一个新的文本文件中。

#! python3
# madLibs.py - Reads in text files and lets the user add their own text anywhere 
# 			   the word ADJECTIVE, NOUN, ADVERB, or VERB appears in the text file.

fileRead = open('read.txt', 'r')
strRead = fileRead.read()
fileRead.close()
print(strRead)

listRead = strRead.split(' ')
for i in range(len(listRead)):
	if 'ADJECTIVE' in listRead[i]:
		print('Enter an adjective:')
		listRead[i] = listRead[i].replace('ADJECTIVE', input())
	elif 'NOUN' in listRead[i]:
		print('Enter a noun:')
		listRead[i] = listRead[i].replace('NOUN', input())
	elif 'ADVERB' in listRead[i]:
		print('Enter an adverb:')
		listRead[i] = listRead[i].replace('ADVERB', input())
	elif 'VERB' in listRead[i]:
		print('Enter a verb:')
		listRead[i] = listRead[i].replace('VERB', input())

strWrite = ' '.join(listRead)
print(strWrite)

fileReplace = open('replaced.txt', 'w')
fileReplace.write(strWrite)
fileReplace.close()

正则表达式查找

编写一个程序，打开文件夹内所有的 .txt 文件，查找匹配用户提供的正则表达式的所有行。将结果打印到屏幕上。

#! python3
# regexSearch.py - Opens all .txt files in a folder and searches for any line 
#                  that matches a user-supplied regular expression.

import re, os

print('Enter a regex:')
strRegex = str(input())
regexObj = re.compile(strRegex)

curPath = os.getcwd()
listPath = os.listdir(curPath)
for item in listPath:
	if item.endswith('.txt'):
		f = open(item, 'r')
		for line in f.readlines():
			if regexObj.search(line) != None:
				print(line)
		f.close()