python课程学习-模块二-01-文件处理

最新推荐文章于 2022-09-06 16:16:34 发布

zhangpfly

最新推荐文章于 2022-09-06 16:16:34 发布

阅读量431

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/zhangpfly/article/details/80033648

版权

python 专栏收录该内容

16 篇文章 0 订阅

订阅专栏

1 . 文件处理的流程

1）打开文件，得到文件句柄并赋值给一个变量
2）通过句柄对文件进行操作
3）关闭文件

例：

In [6]: f1 = open('data.txt', 'r', encoding='utf8')

In [7]: print(f1)
<_io.TextIOWrapper name='data.txt' mode='r' encoding='utf8'>

In [8]: f1.read()  # 显示读入的文本
Out[8]: 'Rain雨\n\nRain is falling all around, 雨儿在到处降落，\nIt falls on field and tree, 它落在田野和树梢， \nIt rains on the umbrella here, 它落在这边的雨伞上，\nAnd on the ships at sea. 又落在航行海上的船只。\n'

In [9]: ! cat data.txt  # 文本原文
Rain雨

Rain is falling all around, 雨儿在到处降落，
It falls on field and tree, 它落在田野和树梢， 
It rains on the umbrella here, 它落在这边的雨伞上，
And on the ships at sea. 又落在航行海上的船只。

In [10]: f1.close()

open函数用来打开文件，主要语法如下：

open(file, mode='r', buffering=-1, encoding=None)
open(文件名, 文件打开方式, 缓冲, 编码格式)

其中文件打开方式，缓冲，编码格式都是可选的。
mode：文件打开模式
buffering：可取值有0，1，>1三个，0代表buffer关闭（只适用于二进制模式），1代表line buffer（只适用于文本模式），>1表示初始化的buffer大小；
encoding：表示的是返回的数据采用何种编码，一般采用utf8或者gbk；

如果文件不存在：

In [16]: f1 = open('data-test.txt', 'r')
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-16-918e552dcaa9> in <module>()
----> 1 f1 = open('data-test.txt', 'r')

FileNotFoundError: [Errno 2] No such file or directory: 'data-test.txt

2. 文件的打开模式

r，只读模式【默认模式，文件必须存在，不存在则抛出异常】
w，只写模式【不可读；不存在则创建；存在则清空内容】
x，只写模式【不可读；不存在则创建，存在则报错】
a，追加模式【可读；不存在则创建；存在则只追加内容】，文件指针自动移到文件尾。

# 1. 打开文件
>>> f1 = open('data.txt', 'r')
>>> f1.read()
'Rain雨\n\nRain is falling all around, 雨儿在到处降落，\nIt falls on field and tree, 它落在田野和树梢， \nIt rains on the umbrella here, 它落在这边的雨伞上，\nAnd on the ships at sea. 又落在航行海上的船只。\n'
>>> f1.close()

# 写入文件
>>> f1 = open('data.txt', 'w')
>>> f1.write('hello world\n')
12
>>> f1.close()
我们退出查看一下文件：
# cat data.txt
hello world
写入的时候，如果没有这个文本文件，会创建一个新的，如果有，就清空内容后再写入
可以看到，原来的文本都不见了，'w'这种模式相当于把原有文本的内容全部清空，重新写入，要追加输入的话，还要用'a'这种模式。


# 追加写入
>>> f1 = open('data.txt', 'a')  # 追加模式
>>> f1.write('hello world\n')   # 写入新数据
12
>>> f1.read()   # 不能读
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
io.UnsupportedOperation: not readable
>>> f1.close()

>>> f1 = open('data.txt', 'r')  # 读取新文本
>>> f1.read()
'Rain雨\n\nRain is falling all around, 雨儿在到处降落，\nIt falls on field and tree, 它落在田野和树梢， \nIt rains on the umbrella here, 它落在这边的雨伞上，\nAnd on the ships at sea. 又落在航行海上的船只。\nhello world\n'   # 可以看到在最后，'hello world'已经被写入了
>>> f1.read().strip('\n').split('\n') # 想做一下进一步处理，但是文本并不能重复读取，最好是对读取结果赋值
['']
>>> f1.close()
>>> 
>>> f1 = open('data.txt', 'r')  # 再次读取
>>> f1.read().strip('\n').split('\n') # 文本处理
['Rain雨', '', 'Rain is falling all around, 雨儿在到处降落，', 'It falls on field and tree, 它落在田野和树梢， ', 'It rains on the umbrella here, 它落在这边的雨伞上，', 'And on the ships at sea. 又落在航行海上的船只。', 'hello world']
>>> 
>>> f1.close()

“+” 表示可以同时读写某个文件
r+，读写【可读，可写】
w+，写读【可读，可写】，消除文件内容，然后以读写方式打开文件。
x+，写读【可读，可写】
a+，写读【可读，可写】，以读写方式打开文件，并把文件指针移到文件尾。

“b”表示以字节的方式操作，以二进制模式打开文件，而不是以文本模式。
rb 或 r+b
wb 或 w+b
xb 或 w+b
ab 或 a+b

>>> f1 = open('data.txt', 'rb')
>>> f1.read()
b'Rain\xe9\x9b\xa8\n\nRain is falling all around, \xe9\x9b\xa8\xe5\x84\xbf\xe5\x9c\xa8\xe5\x88\xb0\xe5\xa4\x84\xe9\x99\x8d\xe8\x90\xbd\xef\xbc\x8c\nIt falls on field and tree, \xe5\xae\x83\xe8\x90\xbd\xe5\x9c\xa8\xe7\x94\xb0\xe9\x87\x8e\xe5\x92\x8c\xe6\xa0\x91\xe6\xa2\xa2\xef\xbc\x8c \nIt rains on the umbrella here, \xe5\xae\x83\xe8\x90\xbd\xe5\x9c\xa8\xe8\xbf\x99\xe8\xbe\xb9\xe7\x9a\x84\xe9\x9b\xa8\xe4\xbc\x9e\xe4\xb8\x8a\xef\xbc\x8c\nAnd on the ships at sea. \xe5\x8f\x88\xe8\x90\xbd\xe5\x9c\xa8\xe8\x88\xaa\xe8\xa1\x8c\xe6\xb5\xb7\xe4\xb8\x8a\xe7\x9a\x84\xe8\x88\xb9\xe5\x8f\xaa\xe3\x80\x82\nhello world\n'
>>> f1.close()
可以看到，读取的数据都是二进制的形式展示的

注：以b方式打开时，读取到的内容是字节类型，写入时也需要提供字节类型，不能指定编码

3. read，readline，readlines的区别

read() #一次读取全部的文件内容。
readline() #每次读取文件的一行。
readlines() #读取文件的所有行，返回一个字符串列表。

我们以下面的例子进行说明：

>>> f1 = open('data.txt', 'r')
>>> f2 = open('data.txt', 'r')
>>> f3 = open('data.txt', 'r')
>>> 
>>> f1.read()  # read会一次性全部读入，并且不会对文本进行处理
'Rain雨\n\nRain is falling all around, 雨儿在到处降落，\nIt falls on field and tree, 它落在田野和树梢， \nIt rains on the umbrella here, 它落在这边的雨伞上，\nAnd on the ships at sea. 又落在航行海上的船只。\nhello world\n'
>>> f2.readline()  # readline是每次只读取一行，相当于迭代器，这样对于大文件来说不占用太多内存空间
'Rain雨\n'
>>> f2.readline()
'\n'
>>> f2.readline()
'Rain is falling all around, 雨儿在到处降落，\n'
>>> f2.readline()
'It falls on field and tree, 它落在田野和树梢， \n'
>>> 
>>> f2.readlines()  # readlines会把f2剩下的文本读取出来，形成列表的形式
['It rains on the umbrella here, 它落在这边的雨伞上，\n', 'And on the ships at sea. 又落在航行海上的船只。\n', 'hello world\n']
>>> f3.readlines()  # 当然，也可以一次性全部读取，形成列表
['Rain雨\n', '\n', 'Rain is falling all around, 雨儿在到处降落，\n', 'It falls on field and tree, 它落在田野和树梢， \n', 'It rains on the umbrella here, 它落在这边的雨伞上，\n', 'And on the ships at sea. 又落在航行海上的船只。\n', 'hello world\n']

4. 文件修改

文件修改的方式：

4.1 硬盘型

逐行读取readline，匹配后修改为新内容，把新内容写入到新文件中，如果内容过多，就flushall到硬盘中。然后把新文件重命名文老文件名（os.rename）。这种方式会比较占硬盘。

例：

import os
file = 'novel'
new_file = 'new_novel'
f1 = open(file, 'r+', encoding='utf-8')
f2 = open(new_file, 'w+', encoding='utf-8')
data = f1.readlines()

for line in data:
    keys = '雨伞'
    if keys in line:
        print("find 雨伞")
        f2.write(line.replace(keys, '花折伞'))
    else:
        f2.write(line)

f1.close()
f2.close()

os.remove(file)
os.rename(new_file, file) 
# 本来想直接rename的，结果在windows上报错了，只好先删除源文件，再重命名新文件，否则不成功。
但是在MAC和linux上直接执行rename是没有报错信息的，可以直接rename。

4.2 内存型

把文件内容全部读入到内存中，在内存中修改好之后，重新写入到原来的文本里，把原来的内容全部覆盖掉
但是当文件非常大的时候，这种方式就不适合了。这种方式比较占内存，处理速度快。

file = 'novel'
fr = open(file, 'r+', encoding='utf-8')
data = fr.readlines()
fr.close()

# 重新以写的模式读入文本
fw = open(file, 'w', encoding='utf-8')
for line in data:
    keys = '雨伞'
    if keys in line:
        print("find 雨伞")
        fw.write(line.replace(keys, '花折伞'))
    else:
        fw.write(line)
fw.close()

5. with…open…as

当你做文件处理，你需要获取一个文件句柄，从文件中读取数据，然后关闭文件句柄。

正常情况下，代码如下：

file = open("/tmp/foo.txt")
data = file.read()
file.close()

这里有两个问题。一是可能忘记关闭文件句柄；二是文件读取数据发生异常，没有进行任何处理。

然而with可以很好的处理上下文环境产生的异常。下面是with版本的代码：

with open("/tmp /foo.txt") as file:
    data = file.read()

with的基本思想是with所求值的对象必须有一个enter()方法，一个exit()方法。紧跟with后面的语句被求值后，返回对象的enter()方法被调用，这个方法的返回值将被赋值给as后面的变量。当with后面的代码块全部被执行完之后，将调用前面返回对象的exit()方法。

6. 其它文件操作方法

指定读取size：

In [14]: f1 = open('novel', 'r')

In [15]: data = f1.read(10)

In [16]: data
Out[16]: 'Rain雨\n\nRai'

In [17]: f1.read()
Out[17]: 'n is falling all around, 雨儿在到处降落，\nIt falls on field and tree, 它落在田野和树梢， \nIt rains on the umbrella here, 它落在这边的花折伞上，\nAnd on the ships at sea. 又落在航行海上的船只。\nhello world\n'

In [18]: f1.close()

fp.flush() #把缓冲区的内容写入硬盘
fp.fileno() #返回一个长整型的”文件标签“
fp.isatty() #文件是否是一个终端设备文件（unix系统中的）
fp.tell() #返回文件操作标记的当前位置，以文件的开头为原点
fp.next() #返回下一行，并将文件操作标记位移到下一行。把一个file用于for … in file这样的语句时，就是调用next()函数来实现遍历的。
fp.seek(offset[,whence]) #将文件打操作标记移到offset的位置。这个offset一般是相对于文件的开头来计算的，一般为正数。但如果提供了whence参数就不一定了，whence可以为0表示从头开始计算，1表示以当前位置为原点计算。2表示以文件末尾为原点进行计算。需要注意，如果文件以a或a+的模式打开，每次进行写操作时，文件操作标记会自动返回到文件末尾。
fp.truncate([size]) #把文件裁成规定的大小，默认的是裁到当前文件操作标记的位置。如果size比文件的大小还要大，依据系统的不同可能是不改变文件，也可能是用0把文件补到相应的大小，也可能是以一些随机的内容加上去。记得文件要以’w’或’w+’模式打开。