Python学习笔记【十一】——《python基础教程》:: 文件和素材_python中文件的读取中的文本文件素材-CSDN博客

本文链接：https://blog.csdn.net/heli200482128/article/details/78875656

11. 文件和素材

11.1. 打开文件

　　open函数用于打开文件。语法如下：
　　open(name[, mode[, buffering]])
　　函数返回一个文件对象，参数参见后续章节。
　　函数的使用参见下面的例子：

>>> f = open(r'somefile.txt') # 打开当前目录下的somefile.txt文件
Traceback (most recent call last): # 若文件不存在，会出现异常回溯
  File "<stdin>", line 1, in <module>
IOError: [Errno 2] No such file or directory: 'somefile.txt'

11.1.1. 文件模式

　　open函数中的模式参数（mode）常用值如下表所示：

值	描述	备注
‘r’	读模式
‘w’	写模式
‘a’	追加模式
‘b’	二进制模式（可添加到其他模式中使用）	用于处理二进制文件，例如音频、图像。例如’rb’用于读取二进制文件。
‘+’	读/写模式（可添加到其他模式中使用）	例如’r+’用于读写时使用。
‘U’	使用通用换行符支持。	具体实现：所有换行符（\r\n、\r或\n）都被转换成\n，不用考虑运行平台。关于不同平台的还行处理参见<为什么使用二进制模式>

为什么使用二进制模式

　　Python在处理文本文件时，执行标准化换行。一般来说，UNIX系统和python中，换行符(\n)表示结束一行并另起一行；在Windows中行结束标志是\r\n。为了实现跨平台，Python对上述区别进行了自动转换：在Windows读取文本时，自动将\r\n转换成\n；Windows写入文件时，将\n转换成\r\n。
　　在Windows系统中采用上述方法使用二进制文件，会破坏二进制数据。因此需要引入二进制文件访问模式。

11.1.2. 缓冲

　　open函数的缓冲参数（buffer）有以下取值：
　　♦ 0（False）：I/O无缓冲（所有都写直接针对硬盘）；
　　♦ 1（True）：I/O有缓冲（内存代替硬盘，提高读写速度，使用flush或close时才会更新硬盘数据）
　　♦ >1：代表缓冲区大小（字节）；
　　♦ -1（任意负数）：使用默认的缓冲区大小。

11.2. 基本文件方法

　　类文件对象是支持一些文件的方法的对象，比如file方法，重要的是支持read方法或者write方法。例如urllib.urlopen返回的对象。
　　sys模块中的三种标准流sys.stdin/sys.stdout/sys.stderr实际上是文件（类文件对象）。

11.2.1. 读和写

　　文件（或流）最重要的能力是提供或接受数据。如果有名为f的类文件对象，那么可以用f.write方法和f.read方法些如何读取数据。

写数据

　　下例演示f.write方法。

>>> f = open('somefile.txt', 'w')
>>> f.write('Hello, ')
>>> f.write('World!') # 调用.write(string)时，参数string会被追加到文件后面
>>> f.close()

读数据

　　接着上面的例子，演示f.read方法。

>>> f = open('somefile.txt') # 省略mode参数时，默认mode参数为'r'
>>> f.read(4) # 读取指定长度的字符数
'Hell'
>>> f.read() # 未指定读取字符数时，读取剩余的文件
'o, World!'

11.2.2. 管式输出

　　UNIX的shell中，管道符号（|）将一个命令的标准输出和下一个命令的标准输入连在一起。例如下面的例子：

# cat somefile.txt | python somescript.py | sort

　　例子中的管道由三个命令组成：
　　♦ cat somefile.txt：将somefile.txt的内容写道标准输出；
　　♦ python somescript.py：运行python脚本somescript。脚本从标准输入读，结果写入标准输出；
　　♦ sort：从标准输入读取所有文本，按字母排序，结果写入到标准输出。

　　假设somescript.py实现计算文本单词个数的脚本，下面是实现代码：

# somescript.py
import sys
text = sys.stdin.read()
words = text.split()
wordcount = len(words)
print 'Wordcount:', wordcount

　　下面的somefile.txt作为测试文本。

Your mother was a hamster and your
father smelled of elderberries.

　　运行结果如下：

# cat somefile.txt | python somescript.py 
Wordcount: 11

随机访问

　　上面提到的功能都将文件作为流顺序访问，若希望文件中随机读取内容，可以使用类文件对象的seek和tell方法。

seek　　

　　seek(offset[, whence])将当前读写位置移动到offset定义的位置。offset为移动到的偏移字节数。whence默认值为0，表示offset从文件开头开始计算；whence为1，表示offset从当前位置开始移动；whence为2，表示相对于文件结尾移动。
　　参见以下例子：

>>> f = open('somefile.txt', 'w')
>>> f.write('01234567890123456789')
>>> f.seek(5)
>>> f.write('Hello, World!')
>>> f.close()
>>> f = open('somefile.txt')
>>> f.read()
'01234Hello, World!89'

tell

　　tell方法返回当前文件的读写位置。见下例：

>>> f = open('somefile.txt')
>>> f.read(3)
'012'
>>> f.read(4)
'34He'
>>> f.tell()
7L

11.2.3. 读写行

　　Python中读写行的方法如下表所示：

函数	说明
readline	读取一行。若不使用参数，可以读取整行；使用非负整数作为参数，可以读取指定字符数。例如，若someFile.readline()返回‘Hello, World!\n’，someFile.readline(5)返回’Hello’。
readlines	读取文件中的所有行，并将其作为列表返回。
writelines	使用字符串列表作为参数，方法会将所有字符串写入文件。

11.2.4. 关闭文件

　　使用close方法关闭文件。关闭文件的必要性有以下几方面：
　　♦ 避免操作系统或者配置中进行无用的修改；
　　♦ 避免用完系统中所打开文件的配额；
　　♦ 对于写入的文件，可以确保Python缓存的数据写入文件，避免程序异常退出时数据丢失。

　　若想确保文件被关闭，有以下两种方法：
　　方法１：使用try/finally语句。在finally中调用close方法。
　　方法2：Python2.5引入了with语句。语句可以打开文件并将其赋值到变量上，对变量进行操作，语句结束后会自动关闭文件。例如：

with open("somefile.txt") as somefile:
  do_something(somefile)

　　★ Python2.5之前的版本，with语句只有在导入如下模块后才能使用：

from __future import with_statement

　　★ 在文件关闭前希望内存中缓存的数据写入到文件，可以使用flush方法。

上下文管理器

　　with语句允许使用上下文管理器（context manager）（一种支持__enter__和__exit__方法的对象）。
　　__enter__方法不带参数，在进入with语句块的时候被调用，返回值绑定到as关键字之后的变量。
　　__exit__方法有三个参数：异常类型、异常对象和异常回溯。在离开方法（带有通过参数提供的，可引发的异常）时被调用。若__exit__返回false，所有异常都不会被调用。
　　文件被用作上下文管理器时，__enter__方法返回文件对象本身，exit方法关闭文件。

11.2.5. 使用基本文件方法

　　举例演示文件方法。原始待编辑的文本文件somefile.txt如下：

Welcome to this file
There is nothing here except
This stupid haiku

　　下面演示基本的文件方法：

# read(n)
>>> f = open('somefile.txt')
>>> f.read(8)
'Welcome '
>>> f.read(3)
'to '
>>> f.close()
# read()
>>> f = open('somefile.txt')
>>> f.read()
'Welcome to this file\nThere is nothing here except\nThis stupid haiku\n'
>>> f.close()
# readline()
>>> f = open('somefile.txt')
>>> for i in range(3):
...   print str(i) + ':' + f.readline() 
... 
0:Welcome to this file

1:There is nothing here except

2:This stupid haiku

>>> f.close()
# readlines()
>>> import pprint
>>> pprint.pprint(open('somefile.txt').readlines()) # 这里使用文件对象自动关闭的方式
['Welcome to this file\n',
 'There is nothing here except\n',
 'This stupid haiku\n']
# write(string)
>>> f = open('somefile.txt', 'w')
>>> f.write('this\nis no\nhaiku')
>>> f.close()

　　运行上述代码后，文件内容修改为：

this
is no
haiku

　　下面演示writelines(list)：

>>> f = open('somefile.txt')
>>> lines = f.readlines()
>>> f.close()
>>> lines[1] = "isn't a\n"
>>> f = open('somefile.txt', 'w')
>>> f.writelines(lines)
>>> f.close()

　　修改后somefile.txt内容更新为：

this
isn't a
haiku

11.3. 对文件内容进行迭代

11.3.1. 按字节处理

　　最常见对文件近内容进行迭代的方法是在while循环中使用read方法。例如：

f = open(filename)
char = f.read(1)
while char:  # char为真，表示没到文件结尾
  process(char)
  char = f.read(1) # 到达文件结尾时，read方法返回一个空的字符串
f.close()

　　为了避免f.read(1)重复使用，可以使用以下方法：

f = open(filename)
while True:
  char = f.read(1)
  if not char: break
  process(char)
f.close()

11.3.2. 按行处理

　　处理行和处理字符方法类似，使用readline方法。具体代码如下：

f = open(filename)
while True:
  line = f.readline()
  if not line: break
  process(line)
f.close()

11.3.3. 读取所有内容

　　如果文件不是很大，可以使用不带参数的read方法或readlines方法读取整个文件。具体代码如下：
　　
　　使用read

f = open(filename)
for char in f.read():
  process(char)
f.close()

　　使用readlines　　

f = open(filename)
for line in f.readlines():
  process(line)
f.close()

11.3.4. 使用fileinput实现懒惰行迭代

　　当处理大文件时，readlines会占用太多内存。这里可以使用for循环和fileinput模块实现懒惰行迭代方法：它只需要读取实际需要的文件部分。

import fileinput
for line in fileinput.input(filename): # fileinput模块包含了打开文件的函数，只需要传递文件名
  process(line)

　　NOTE：旧式代码中，可使用xreadlines实现懒惰行迭代，它的返回值是xreadlines对象。新写的代码中更推荐使用fileinput或文件迭代器。　　

11.3.5. 文件迭代器

　　从Python２.２开始，文件对象是可迭代的。这意味着，可以直接在for循环中使用它们实现迭代。例如：

f = open(filename)
for line in f:
  process(line)
f.close()

　　若没有向文件写入内容，可以不关闭文件。也可以简化代码，让Python负责关闭文件。例如：

for line in open(filename): # 没有文件对象复制给变量，因此无法显示关闭
  process(line)

　　sys.stdin也是可迭代的文件对象，可以参照文件对象使用sys.stdin。例如：

import sys
for line in sys.stdin:
  process(line)

　　文件迭代器可以执行和普通迭代器相同的操作。比如将它们转换为字符串列表，例如有文本文件somefile.txt，内容如下：

First line
Second line
Third line

　　下面演示文件迭代器的操作：

>>> lines = list(open('somefile.txt'))
>>> lines
['First line\n', 'Second line\n', 'Third line\n']
>>> first, second, third = open('somefile.txt')
>>> first
'First line\n'
>>> second
'Second line\n'
>>> third
'Third line\n'