十、文件操作

我是小青菜

已于 2022-04-10 16:40:28 修改

阅读量1.1k

点赞数 7

分类专栏： python 文章标签： python

于 2022-04-10 16:39:43 首次发布

本文链接：https://blog.csdn.net/liu_sir1009/article/details/124080942

版权

python 专栏收录该内容

19 篇文章 2 订阅

订阅专栏

文件操作
文件夹和路径
csv格式文件
ini格式文件
xml格式文件
excel文件
压缩文件

1.文件操作

字符串类型（str），在程序中用于表示文字信息，本质上是unicode编码中的二进制。
```
name = "王昭君"
```
字节类型（bytes）
- 可表示文字信息，本质上是utf-8/gbk等编码的二进制（对unicode进行压缩，方便文件存储和网络传输。）
- Python open() 方法用于打开一个文件，并返回文件对象，在对文件进行处理过程都需要使用到这个函数，如果该文件无法被打开，会抛出 OSError。
  
  **注意：**使用 open() 方法一定要保证关闭文件对象，即调用 close() 方法。
  
  open() 函数常用形式是接收两个参数：文件名(file)和模式(mode)。

1.1 读取文件语法

```python
# file: 必需，文件路径（相对或者绝对路径）。
# mode: 可选，文件打开模式
# buffering: 设置缓冲
# encoding: 一般使用utf8
# errors: 报错级别
# newline: 区分换行符
# closefd: 传入的file参数类型
# opener: 设置自定义开启器，开启器的返回值必须是一个打开的文件描述符。

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
```

1.1.1mode参数

模式	描述
t	文本模式 (默认)。
x	写模式，新建一个文件，如果该文件已存在则会报错。
b	二进制模式。
+	打开一个文件进行更新(可读可写)
r	以只读方式打开文件。文件的指针将会放在文件的开头。这是默认模式。
rb	以二进制格式打开一个文件用于只读。文件指针将会放在文件的开头。这是默认模式。一般用于非文本文件如图片等。
r+	打开一个文件用于读写。文件指针将会放在文件的开头。
rb+	以二进制格式打开一个文件用于读写。文件指针将会放在文件的开头。一般用于非文本文件如图片等。
w	打开一个文件只用于写入。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。
wb	以二进制格式打开一个文件只用于写入。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。一般用于非文本文件如图片等。
w+	打开一个文件用于读写。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。
wb+	以二进制格式打开一个文件用于读写。如果该文件已存在则打开文件，并从开头开始编辑，即原有内容会被删除。如果该文件不存在，创建新文件。一般用于非文本文件如图片等。
a	打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。也就是说，新的内容将会被写入到已有内容之后。如果该文件不存在，创建新文件进行写入。
ab	以二进制格式打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。也就是说，新的内容将会被写入到已有内容之后。如果该文件不存在，创建新文件进行写入。
a+	打开一个文件用于读写。如果该文件已存在，文件指针将会放在文件的结尾。文件打开时会是追加模式。如果该文件不存在，创建新文件用于读写。
ab+	以二进制格式打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。如果该文件不存在，创建新文件用于读写。

默认为文本模式，如果要以二进制模式打开，加上 b 。

关于文件的打开模式常见应用有：

只读：r、rt、rb （用）
- 存在，读
- 不存在，报错
- r+、rt+、rb+，默认光标位置：起始位置
只写：w、wt、wb（用）
- 存在，清空再写
- 不存在，创建再写
- w+、wt+、wb+，默认光标位置：起始位置（清空文件）
只写：x、xt、xb
- 存在，报错
- 不存在，创建再写。
- x+、xt+、xb+，默认光标位置：起始位置（新文件）
只写：a、at、ab【尾部追加】（用）
- 存在，尾部追加。
- 不存在，创建再写。
- a+、at+、ab+，默认光标位置：末尾
读写
- r+、rt+、rb+，默认光标位置：起始位置

1.1.12 file对象

序号	方法及描述
1	file.close()关闭文件。关闭后文件不能再进行读写操作。
2	file.flush()刷新文件内部缓冲，直接把内部缓冲区的数据立刻写入文件, 而不是被动的等待输出缓冲区写入。
3	file.fileno()返回一个整型的文件描述符(file descriptor FD 整型), 可以用在如os模块的read方法等一些底层操作上。
4	file.isatty()如果文件连接到一个终端设备返回 True，否则返回 False。
6	file.read([size])]从文件读取指定的字节数，如果未给定或为负则读取所有。
7	file.readline([size])]读取整行，包括 “\n” 字符。
8	file.readlines([sizeint])读取所有行并返回列表，若给定sizeint>0，返回总和大约为sizeint字节的行, 实际读取值可能比 sizeint 较大, 因为需要填充缓冲区。
9	file.seek(offset[, whence])移动文件读取指针到指定位置
10	file.tell()返回文件当前位置。
11	file.truncate([size])从文件的首行首字符开始截断，截断文件为 size 个字符，无 size 表示从当前位置截断；截断之后后面的所有字符被删除，其中 windows 系统下的换行代表2个字符大小。
12	file.write(str)将字符串写入文件，返回的是写入的字符长度。
13	file.writelines(sequence)向文件写入一个序列字符串列表，如果需要换行则要自己加入每行的换行符。

虽然操作对象有那么多, 但是我们常用也就两个, 一个是close 和flush

1.1.3示例一:读取文件

# 1.打开文件
file_object = open('info.txt', mode='rb')
# 2.读取文件内容，并赋值给data
data = file_object.read()
# 3.关闭文件
file_object.close()

print(data) # b'alex-123\n\xe6\xad\xa6\xe6\xb2\x9b\xe9\xbd\x90-123'
text = data.decode("utf-8")
print(text)

1.1.4示例二:读图片等非文本内容文件

file_object = open('a1.png', mode='rb')
data = file_object.read()
file_object.close()

print(data) # \x91\xf6\xf2\x83\x8aQFfv\x8b7\xcc\xed\xc3}\x7fT\x9d{.3.\xf1{\xe8\...xxxxxxxxxx 读图片等非文本内容文件file_object = open('a1.png', mode='rb')data = file_object.read()file_object.close()print(data) # \x91\xf6\xf2\x83\x8aQFfv\x8b7\xcc\xed\xc3}\x7fT\x9d{.3.\xf1{\xe8\...

1.1.5 示例三:判断文件是否存在

file_path = "/Users/wupeiqi/Pycharm/python/day10/info.txt"
exists = os.path.exists(file_path)
if exists:
    # 1.打开文件
    file_object = open('infower.txt', mode='rt', encoding='utf-8')
    # 2.读取文件内容，并赋值给data
    data = file_object.read()
    # 3.关闭文件
    file_object.close()
    print(data)
else:
    print("文件不存在")

1.1.6 案例四:写文本文件

# 1.打开文件
# 路径：t1.txt
# 模式：wb（要求写入的内容需要是字节类型）
file_object = open("t1.txt", mode='wb')
# 2.写入内容
file_object.write(    "武沛齐".encode("utf-8")    )
# 3.文件关闭
file_object.close()

1.1.7 案例五: 写入图片

f1 = open('a1.png',mode='rb')
content = f1.read()
f1.close()

f2 = open('a2.png',mode='wb')
f2.write(content)
f2.close()

1.1.8案例六: 案例用户注册

# 案例1：用户注册
user = input("请输入用户名：")
pwd = input("请输入密码：")
data = "{}-{}".format(user, pwd)
file_object = open("files/info.txt", mode='wt', encoding='utf-8')
file_object.write(data)
file_object.close()


# 案例2：多用户注册
# w写入文件，先清空文件；再在文件中写入内容。
file_object = open("files/info.txt", mode='wt', encoding='utf-8')
while True:
    user = input("请输入用户名：")
    if user.upper() == "Q":
        break
    pwd = input("请输入密码：")
    data = "{}-{}\n".format(user, pwd)

    file_object.write(data)
file_object.close()

1.1.9 案例七: 网络图片下载

# 案例1：去网上下载一点文本，文本信息写入文件。
import requests

res = requests.get(
    url="图片地址",
    headers={
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
    }
)

# 网络传输的原始二进制信息（bytes）
# res.content

file_object = open('files/log1.txt', mode='wb')
file_object.write(res.content)
file_object.close()



# 案例2：去网上下载一张图片，图片写入本地文件。
import requests

res = requests.get(
   url="图片地址",
    headers={
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
    }
)

# 网络传输的原始二进制信息（bytes）
# res.content

file_object = open('files/美女.png', mode='wb')
file_object.write(res.content)
file_object.close()

1.1.10案例八: r+、rt+、rb+，默认光标位置：起始位置

file_object = open('info.txt', mode='rt+')

# 读取内容
data = file_object.read()
print(data)

# 写入内容
file_object.write("你好呀")

file_object.close()

1.1.11 案例九: w+、wt+、wb+，默认光标位置：起始位置（清空文件）

file_object = open('info.txt', mode='wt+')

# 读取内容
data = file_object.read()
print(data)

# 写入内容
file_object.write("你好呀")

# 将光标位置重置起始
file_object.seek(0)

# 读取内容
data = file_object.read()
print(data)

file_object.close()

1.1.12案例十: a+、at+、ab+，默认光标位置：末尾

file_object = open('info.txt', mode='at+')

# 写入内容
file_object.write("王昭君")

# 将光标位置重置起始
file_object.seek(0)

# a模式下移动无效果
# 读取内容
data = file_object.read()
print(data)

file_object.close()

1.1.13 常见功能

在上述对文件的操作中，我们只使用了write和read来对文件进行读写，其实在文件操作中还有很多其他的功能来辅助实现更好的读写文件的内容。

read，读

读所有【常用】

f = open('info.txt', mode='r',encoding='utf-8')
data = f.read()
f.close()

读n个字符（字节）【会用到】

f = open('info.txt', mode='r', encoding='utf-8')
# 读1个字符
data = f.read(1)
f.close()

print(data) # 武

f = open('info.txt', mode='r',encoding='utf-8')

# 读1个字符
chunk1 = f.read(1)
chunk2 = f.read(2)
print(chunk1,chunk2)

f.close()

f = open('info.txt', mode='rb')

# 读1个字节
data = f.read(3)
f.close()

print(data, type(data))  # b'\xe6\xad\xa6' <class 'bytes'>

f = open('info.txt', mode='rb')

# 读1个字节
chunk1 = f.read(3)
chunk2 = f.read(3)
chunk3 = f.read(1)
print(chunk1,chunk2,chunk3)

f.close()

readline，读一行

f = open('info.txt', mode='r', encoding='utf-8')

v1 = f.readline()
print(v1)

v2 = f.readline()
print(v2)

f.close()

f = open('info.txt', mode='r', encoding='utf-8')
v1 = f.readline()
print(v1)
f.close()

f = open('info.txt', mode='r', encoding='utf-8')
v2 = f.readline()
print(v2)
f.close()

readlines，读所有行，每行作为列表的一个元素

f = open('info.txt', mode='rb')

data_list = f.readlines()

f.close()

print(data_list)

循环，读大文件（readline加强版）【常见】

f = open('info.txt', mode='r', encoding='utf-8')
for line in f:
    print(line.strip())
f.close()

write，写

f = open('info.txt', mode='a',encoding='utf-8')
f.write("王昭君")
f.close()

f = open('info.txt', mode='ab')
f.write( "王昭君".encode("utf-8") )
f.close()

flush，刷到硬盘

f = open('info.txt', mode='a',encoding='utf-8')

while True:
    # 不是写到了硬盘，而是写在缓冲区，系统会将缓冲区的内容刷到硬盘。
	f.write("王昭君")
    f.flush()

f.close()

file_object = open('files/account.txt', mode='a')

while True:
    user = input("用户名：")
    if user.upper() == "Q":
        break
    pwd = input("密码：")
    data = "{}-{}\n".format(user, pwd)
    file_object.write(data)
    file_object.flush()

file_object.close()

移动光标位置（字节）
```
f = open('info.txt', mode='r+', encoding='utf-8')

# 移动到指定字节的位置
f.seek(3)
f.write("王昭君")

f.close()
```
注意：在a模式下，调用write在文件中写入内容时，永远只能将内容写入到尾部，不会写到光标的位置。

获取当前光标位置

f = open('info.txt', mode='r', encoding='utf-8')

p1 = f.tell()
print(p1)  # 0

f.read(3)  # 读3个字符 3*3=9字节

p2 = f.tell()
print(p2)  # 9

f.close()

f = open('info.txt', mode='rb')

p1 = f.tell()
print(p1)  # 0

f.read(3)  # 读3个字节

p2 = f.tell()
print(p2)  # 3

f.close()

1.1.14上下文管理

之前对文件进行操作时，每次都要打开和关闭文件，比较繁琐且容易忘记关闭文件。

以后再进行文件操作时，推荐大家使用with上下文管理，它可以自动实现关闭文件。

with open("xxxx.txt", mode='rb') as file_object:
    data = file_object.read()
    print(data)

1.1.14练习题

日志分析，计算某用户223.73.89.192访问次数。日志文件如下：access.log

49.89.167.91 - - [17/Dec/2020:03:43:50 +0800] "GET /wiki/detail/3/40 HTTP/1.1" 301 0 "-" "Mozilla/5.0(Linux;Android 5.1.1;OPPO A33 Build/LMY47V;wv) AppleWebKit/537.36(KHTML,link Gecko) Version/4.0 Chrome/43.0.2357.121 Mobile Safari/537.36 LieBaoFast/4.51.3" "-"
49.89.167.91 - - [17/Dec/2020:03:44:11 +0800] "GET /wiki/detail/3/40/ HTTP/1.1" 200 8033 "-" "Mozilla/5.0(Linux;Android 5.1.1;OPPO A33 Build/LMY47V;wv) AppleWebKit/537.36(KHTML,link Gecko) Version/4.0 Chrome/43.0.2357.121 Mobile Safari/537.36 LieBaoFast/4.51.3" "-"
203.208.60.66 - - [17/Dec/2020:03:47:58 +0800] "GET /media/uploads/2019/11/17/pic/s1.png HTTP/1.1" 200 710728 "-" "Googlebot-Image/1.0" "-"
223.73.89.192 - - [17/Dec/2020:03:48:26 +0800] "GET /wiki/detail/3/40/ HTTP/1.1" 200 8033 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36 Edg/87.0.664.60" "-"

ip = "223.73.89.192"

count_1 = 0
with open('files/access.log', mode='r', encoding='utf-8') as f:
    for item in f:
        if item.startswith(ip):
            count_1 += 1
print(count_1)
或
total_count = 0
ip = "223.73.89.192"
with open('files/access.log',mode='r', encoding='utf-8') as file_object:
    for line in file_object:
        if line.startswith(ip):
            total_count += 1
print(total_count)

日志分析升级，计算所有用户的访问次数。

user_dict = {}
with open('files/access.log', mode='r', encoding='utf-8') as file_object:
    for line in file_object:
        user_ip = line.split(" ")[0]
        if user_ip in user_dict:
            user_dict[user_ip] += 1
        else:
            user_dict[user_ip] = 1
print(user_dict)

筛选出股票当前价大于 20 的所有股票数据。

# 文件中内容格式
股票代码,股票名称,当前价,涨跌额,涨跌幅,年初至今,成交量,成交额,换手率,市盈率(TTM),股息率,市值
SH601778,N晶科,6.29,+1.92,+43.94%,+43.94%,259.66万,1625.52万,0.44%,22.32,-,173.95亿
.....

stock_dict = {}
with open('files/stock.txt', mode='r', encoding='utf-8') as file_object:
    file_object.readline()
    for line in file_object:
        name, text = line.split(',')[1], line.split(',')[2]
        price = float(text)
        if price > 20:
            stock_dict[name] = text

print(stock_dict)

根据要求修改文件的内容，原文件内容如下：ha.conf

global       
        log 127.0.0.1 local2
        daemon
        maxconn 256
        log 127.0.0.1 local2 info
defaults
        log global
        mode http
        timeout connect 5000ms
        timeout client 50000ms
        timeout server 50000ms
        option  dontlognull
请将文件中的 `defaults`修改为 `python` 。

with open('files/ha.conf', mode='r', encoding='utf-8') as read_file_object, open(
        'files/new.conf', mode='w', encoding='utf-8') as writh_file_object:
    for line in read_file_object:
        new_line = line.replace('defaults', 'python')
        writh_file_object.write(new_line)

2.csv格式文件

逗号分隔值（Comma-Separated Values，CSV，有时也称为字符分隔值，因为分隔字符也可以不是逗号），其文件以纯文本形式存储表格数据（数字和文本）。

对于这种格式的数据，我们需要利用open函数来读取文件并根据逗号分隔的特点来进行处理。

股票代码,股票名称,当前价,涨跌额,涨跌幅,年初至今
SH601778,N晶科,6.29,+1.92,-43.94%,+43.94%
SH688566,吉贝尔,52.66,+6.96,+15.23%,+122.29%
...

练习题案例：下载文档中的所有图片且以用户名为图片名称存储

ID,用户名,头像
26044585,Hush,https://hbimg.huabanimg.com/51d46dc32abe7ac7f83b94c67bb88cacc46869954f478-aP4Q3V
19318369,柒十一,https://hbimg.huabanimg.com/703fdb063bdc37b11033ef794f9b3a7adfa01fd21a6d1-wTFbnO
15529690,Law344,https://hbimg.huabanimg.com/b438d8c61ed2abf50ca94e00f257ca7a223e3b364b471-xrzoQd
18311394,Jennah·,https://hbimg.huabanimg.com/4edba1ed6a71797f52355aa1de5af961b85bf824cb71-px1nZz
18009711,可洛爱画画,https://hbimg.huabanimg.com/03331ef39b5c7687f5cc47dbcbafd974403c962ae88ce-Co8AUI

import os
import requests

with open('files/mv.csv', mode='r', encoding='utf-8') as file_object:
    file_object.readline()
    for line in file_object:
        user_id, username, url = line.strip().split(',')
        print(username, url)
        # 1.根据URL下载图片
        res = requests.get(
            url=url,
            headers={
                "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
            }
        )
        # 检查images目录是否存在？不存在，则创建images目录
        if not os.path.exists("images"):
            # 创建images目录
            os.makedirs("images")

        # 2.将图片的内容写入到文件
        with open("images/{}.png".format(username), mode='wb') as img_object:
            img_object.write(res.content)

3.ini格式文件

ini文件是Initialization File的缩写，平时用于存储软件的的配置文件。例如：MySQL数据库的配置文件。

[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
log-bin=py-mysql-bin
character-set-server=utf8
collation-server=utf8_general_ci
log-error=/var/log/mysqld.log
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0

[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid

[client]
default-character-set=utf8

这种格式是可以直接使用open来出来，考虑到自己处理比较麻烦，所以Python为我们提供了更为方便的方式。

3.1读取所有节点

import configparser

config = configparser.ConfigParser()
config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8')
# config.read('my.conf', encoding='utf-8')
ret = config.sections()
print(ret) 

>>输出
['mysqld', 'mysqld_safe', 'client']

3.2读取节点下的键值

import configparser

config = configparser.ConfigParser()
config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8')
# config.read('my.conf', encoding='utf-8')
item_list = config.items("mysqld_safe")
print(item_list) 

>>输出
[('log-error', '/var/log/mariadb/mariadb.log'), ('pid-file', '/var/run/mariadb/mariadb.pid')]

3.3读取节点下值（根据节点+键）

import configparser

config = configparser.ConfigParser()
config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8')

value = config.get('mysqld', 'log-bin')
print(value)

>>输出
py-mysql-bin

3.4 检查、删除、添加节点

import configparser

config = configparser.ConfigParser()
config.read('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/my.conf', encoding='utf-8')
# config.read('my.conf', encoding='utf-8')


# 检查
has_sec = config.has_section('mysqld')
print(has_sec)

# 添加节点
config.add_section("SEC_1")
# 节点中设置键值
config.set('SEC_1', 'k10', "123")
config.set('SEC_1', 'name', "哈哈哈哈哈")

config.add_section("SEC_2")
config.set('SEC_2', 'k10', "123")
# 内容写入新文件
config.write(open('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/xxoo.conf', 'w'))


# 删除节点
config.remove_section("SEC_2")
# 删除节点中的键值
config.remove_option('SEC_1', 'k10')
config.write(open('/Users/wupeiqi/PycharmProjects/luffyCourse/day09/files/new.conf', 'w'))

4.xml格式文件

可扩展标记语言，是一种简单的数据存储语言，XML 被设计用来传输和存储数据。

存储，可用来存放配置文件，例如：java的配置文件。
传输，网络传输时以这种格式存在，例如：早期ajax传输的数据、soap协议等。

<data>
    <country name="Liechtenstein">
        <rank updated="yes">2</rank>
        <year>2023</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
    <country name="Singapore">
        <rank updated="yes">5</rank>
        <year>2026</year>
        <gdppc>59900</gdppc>
        <neighbor direction="N" name="Malaysia" />
    </country>
    <country name="Panama">
        <rank updated="yes">69</rank>
        <year>2026</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>

注意：在Python开发中用的相对来比较少，大家作为了解即可（后期课程在讲解微信支付、微信公众号消息处理时会用到基于xml传输数据）。

例如：https://developers.weixin.qq.com/doc/offiaccount/Message_Management/Receiving_standard_messages.html

4.1 读取文件和内容

from xml.etree import ElementTree as ET

# ET去打开xml文件
tree = ET.parse("files/xo.xml")

# 获取根标签
root = tree.getroot()

print(root) # <Element 'data' at 0x7f94e02763b0>

4.2 读取节点数据

from xml.etree import ElementTree as ET

content = """
<data>
    <country name="Liechtenstein" id="999" >
        <rank>2</rank>
        <year>2023</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
     <country name="Panama">
        <rank>69</rank>
        <year>2026</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>
"""

# 获取根标签 data
root = ET.XML(content)
country_object = root.find('country')
print(country_object.tag, country_object.attrib)
>>>country {'name': 'Liechtenstein', 'id': '999'}
gdppc_object = country_object.find("gdppc")
print(gdppc_object.tag,gdppc_object.attrib,gdppc_object.text)
>>> gdppc {} 141100

from xml.etree import ElementTree as ET

content = """
<data>
    <country name="Liechtenstein">
        <rank>2</rank>
        <year>2023</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
     <country name="Panama">
        <rank>69</rank>
        <year>2026</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>
"""

# 获取根标签 data
root = ET.XML(content)

# 获取data标签的孩子标签
for child in root:
    # child.tag = conntry
    # child.attrib = {"name":"Liechtenstein"}
    print(child.tag, child.attrib)
    for node in child:
        print(node.tag, node.attrib, node.text)

from xml.etree import ElementTree as ET

content = """
<data>
    <country name="Liechtenstein">
        <rank>2</rank>
        <year>2023</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
     <country name="Panama">
        <rank>69</rank>
        <year>2026</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>
"""

root = ET.XML(content)

for child in root.iter('year'):
    print(child.tag, child.text)

4.3 修改和删除节点

from xml.etree import ElementTree as ET

content = """
<data>
    <country name="Liechtenstein">
        <rank>2</rank>
        <year>2023</year>
        <gdppc>141100</gdppc>
        <neighbor direction="E" name="Austria" />
        <neighbor direction="W" name="Switzerland" />
    </country>
     <country name="Panama">
        <rank>69</rank>
        <year>2026</year>
        <gdppc>13600</gdppc>
        <neighbor direction="W" name="Costa Rica" />
        <neighbor direction="E" name="Colombia" />
    </country>
</data>
"""

root = ET.XML(content)

# 修改节点内容和属性
rank = root.find('country').find('rank')
print(rank.text)
rank.text = "999"
rank.set('update', '2020-11-11')
print(rank.text, rank.attrib)
############ 保存文件 ############
tree = ET.ElementTree(root)
tree.write("new.xml", encoding='utf-8')


# 删除节点
root.remove( root.find('country') )
print(root.findall('country'))

############ 保存文件 ############
tree = ET.ElementTree(root)
tree.write("newnew.xml", encoding='utf-8')

4.4 构建文档

<home>
    <son name="儿1">
        <grandson name="儿11"></grandson>
        <grandson name="儿12"></grandson>
    </son>
    <son name="儿2"></son>
</home>

from xml.etree import ElementTree as ET

# 创建根标签
root = ET.Element("home")

# 创建节点大儿子
son1 = ET.Element('son', {'name': '儿1'})
# 创建小儿子
son2 = ET.Element('son', {"name": '儿2'})

# 在大儿子中创建两个孙子
grandson1 = ET.Element('grandson', {'name': '儿11'})
grandson2 = ET.Element('grandson', {'name': '儿12'})
son1.append(grandson1)
son1.append(grandson2)

# 把儿子添加到根节点中
root.append(son1)
root.append(son2)

tree = ET.ElementTree(root)
tree.write('oooo.xml', encoding='utf-8', short_empty_elements=False)

<famliy>
	<son name="儿1">
    	<age name="儿11">孙子</age>
    </son>
	<son name="儿2"></son>
</famliy>

from xml.etree import ElementTree as ET

# 创建根节点
root = ET.Element("famliy")

# 创建节点大儿子
son1 = ET.SubElement(root, "son", attrib={'name': '儿1'})
# 创建小儿子
son2 = ET.SubElement(root, "son", attrib={"name": "儿2"})
# 在大儿子中创建一个孙子
grandson1 = ET.SubElement(son1, "age", attrib={'name': '儿11'})
grandson1.text = '孙子'

et = ET.ElementTree(root)  #生成文档对象
et.write("test.xml", encoding="utf-8")

<user><![CDATA[你好呀]]</user>

from xml.etree import ElementTree as ET

# 创建根节点
root = ET.Element("user")
root.text = "<![CDATA[你好呀]]"

et = ET.ElementTree(root)  # 生成文档对象
et.write("test.xml", encoding="utf-8")

案例：

content = """<xml>
    <ToUserName><![CDATA[gh_7f083739789a]]></ToUserName>
    <FromUserName><![CDATA[oia2TjuEGTNoeX76QEjQNrcURxG8]]></FromUserName>
    <CreateTime>1395658920</CreateTime>
    <MsgType><![CDATA[event]]></MsgType>
    <Event><![CDATA[TEMPLATESENDJOBFINISH]]></Event>
    <MsgID>200163836</MsgID>
    <Status><![CDATA[success]]></Status>
</xml>"""

from xml.etree import ElementTree as ET

info = {}
root = ET.XML(content)
for node in root:
    # print(node.tag,node.text)
    info[node.tag] = node.text
print(info)

5.Excel格式文件

Python内部未提供处理Excel文件的功能，想要在Python中操作Excel需要按照第三方的模块。

官方文档地址: https://openpyxl.readthedocs.io/en/stable/

pip install openpyxl

5.1 读sheet

from openpyxl import load_workbook

wb = load_workbook("files/p1.xlsx")

# sheet相关操作

# 1.获取excel文件中的所有sheet名称
"""
print(wb.sheetnames) # ['数据导出', '用户列表', 'Sheet1', 'Sheet2']
"""

# 2.选择sheet，基于sheet名称
"""
sheet = wb["数据导出"]
cell = sheet.cell(1, 2)
print(cell.value)
"""

# 3.选择sheet，基于索引位置
"""
sheet = wb.worksheets[0]
cell = sheet.cell(1,2)
print(cell.value)
"""

# 4.循环所有的sheet
"""
for name in wb.sheetnames:
    sheet = wb[name]
    cell = sheet.cell(1, 1)
    print(cell.value)
"""
"""
for sheet in wb.worksheets:
    cell = sheet.cell(1, 1)
    print(cell.value)
"""
"""
for sheet in wb:
    cell = sheet.cell(1, 1)
    print(cell.value)
"""

5.2 读sheet中单元格的数据

from openpyxl import load_workbook

wb = load_workbook("files/p1.xlsx")
sheet = wb.worksheets[0]

# 1.获取第N行第N列的单元格(位置是从1开始）
"""
cell = sheet.cell(1, 1)

print(cell.value)
print(cell.style)
print(cell.font)
print(cell.alignment)
"""

# 2.获取某个单元格
"""
c1 = sheet["A2"]
print(c1.value)

c2 = sheet['D4']
print(c2.value)
"""

# 3.第N行所有的单元格
"""
for cell in sheet[1]:
    print(cell.value)
"""

# 4.所有行的数据（获取某一列数据）
"""
for row in sheet.rows:
    print(row[0].value, row[1].value)
"""

# 5.获取所有列的数据
"""
for col in sheet.columns:
    print(col[1].value)
""

5.3 读合并的单元格

from openpyxl import load_workbook

wb = load_workbook("files/p1.xlsx")
sheet = wb.worksheets[2]

# 获取第N行第N列的单元格(位置是从1开始）
c1 = sheet.cell(1, 1)
print(c1)  # <Cell 'Sheet1'.A1>
print(c1.value) # 用户信息

c2 = sheet.cell(1, 2)
print(c2)  # <MergedCell 'Sheet1'.B1>
print(c2.value) # None

from openpyxl import load_workbook

wb = load_workbook('files/p1.xlsx')
sheet = wb.worksheets[2]
for row in sheet.rows:
    print(row)

>>> 输出结果
(<Cell 'Sheet1'.A1>, <MergedCell 'Sheet1'.B1>, <Cell 'Sheet1'.C1>)
(<Cell 'Sheet1'.A2>, <Cell 'Sheet1'.B2>, <Cell 'Sheet1'.C2>)
(<Cell 'Sheet1'.A3>, <Cell 'Sheet1'.B3>, <Cell 'Sheet1'.C3>)
(<MergedCell 'Sheet1'.A4>, <Cell 'Sheet1'.B4>, <Cell 'Sheet1'.C4>)
(<Cell 'Sheet1'.A5>, <Cell 'Sheet1'.B5>, <Cell 'Sheet1'.C5>)

5.4 写Excel

原Excel文件基础上写内容。

from openpyxl import load_workbook

wb = load_workbook('files/p1.xlsx')
sheet = wb.worksheets[0]

# 找到单元格，并修改单元格的内容
cell = sheet.cell(1, 1)
cell.value = "新的开始"

# 将excel文件保存到p2.xlsx文件中
wb.save("files/p2.xlsx")

新创建Excel文件写内容。

from openpyxl import workbook

# 创建excel且默认会创建一个sheet（名称为Sheet）
wb = workbook.Workbook()

sheet = wb.worksheets[0] # 或 sheet = wb["Sheet"]

# 找到单元格，并修改单元格的内容
cell = sheet.cell(1, 1)
cell.value = "新的开始"

# 将excel文件保存到p2.xlsx文件中
wb.save("files/p2.xlsx")

5.5在了解了如何读取Excel和创建Excel之后，后续对于Excel中的sheet和cell操作基本上都相同。

from openpyxl import workbook

wb = workbook.Workbook() # Sheet

# 1. 修改sheet名称
"""
sheet = wb.worksheets[0]
sheet.title = "数据集"
wb.save("p2.xlsx")
"""

# 2. 创建sheet并设置sheet颜色
"""
sheet = wb.create_sheet("工作计划", 0)
sheet.sheet_properties.tabColor = "1072BA"
wb.save("p2.xlsx")
"""

# 3. 默认打开的sheet
"""
wb.active = 0
wb.save("p2.xlsx")
"""

# 4. 拷贝sheet
"""
sheet = wb.create_sheet("工作计划")
sheet.sheet_properties.tabColor = "1072BA"

new_sheet = wb.copy_worksheet(wb["Sheet"])
new_sheet.title = "新的计划"
wb.save("p2.xlsx")
"""

# 5.删除sheet
"""
del wb["用户列表"]
wb.save('files/p2.xlsx')
"""

from openpyxl import load_workbook
from openpyxl.styles import Alignment, Border, Side, Font, PatternFill, GradientFill


wb = load_workbook('files/p1.xlsx')

sheet = wb.worksheets[1]

# 1. 获取某个单元格，修改值
"""
cell = sheet.cell(1, 1)
cell.value = "开始"
wb.save("p2.xlsx")
"""

# 2.  获取某个单元格，修改值
"""
sheet["B3"] = "Alex"
wb.save("p2.xlsx")
"""

# 3. 获取某些单元格，修改值
"""
cell_list = sheet["B2":"C3"]
for row in cell_list:
    for cell in row:
        cell.value = "新的值"
wb.save("p2.xlsx")
"""

# 4. 对齐方式
"""
cell = sheet.cell(1, 1)

# horizontal，水平方向对齐方式："general", "left", "center", "right", "fill", "justify", "centerContinuous", "distributed"
# vertical，垂直方向对齐方式："top", "center", "bottom", "justify", "distributed"
# text_rotation，旋转角度。
# wrap_text，是否自动换行。
cell.alignment = Alignment(horizontal='center', vertical='distributed', text_rotation=45, wrap_text=True)
wb.save("p2.xlsx")
"""

# 5. 边框
# side的style有如下：dashDot','dashDotDot', 'dashed','dotted','double','hair', 'medium', 'mediumDashDot', 'mediumDashDotDot','mediumDashed', 'slantDashDot', 'thick', 'thin'
"""
cell = sheet.cell(9, 2)
cell.border = Border(
    top=Side(style="thin", color="FFB6C1"), 
    bottom=Side(style="dashed", color="FFB6C1"),
    left=Side(style="dashed", color="FFB6C1"),
    right=Side(style="dashed", color="9932CC"),
    diagonal=Side(style="thin", color="483D8B"),  # 对角线
    diagonalUp=True,  # 左下 ~ 右上
    diagonalDown=True  # 左上 ~ 右下
)
wb.save("p2.xlsx")
"""

# 6.字体
"""
cell = sheet.cell(5, 1)
cell.font = Font(name="微软雅黑", size=45, color="ff0000", underline="single")
wb.save("p2.xlsx")
"""

# 7.背景色
"""
cell = sheet.cell(5, 3)
cell.fill = PatternFill("solid", fgColor="99ccff")
wb.save("p2.xlsx")
"""

# 8.渐变背景色
"""
cell = sheet.cell(5, 5)
cell.fill = GradientFill("linear", stop=("FFFFFF", "99ccff", "000000"))
wb.save("p2.xlsx")
"""

# 9.宽高（索引从1开始）
"""
sheet.row_dimensions[1].height = 50
sheet.column_dimensions["E"].width = 100
wb.save("p2.xlsx")
"""

# 10.合并单元格
"""
sheet.merge_cells("B2:D8")
sheet.merge_cells(start_row=15, start_column=3, end_row=18, end_column=8)
wb.save("p2.xlsx")
"""
"""
sheet.unmerge_cells("B2:D8")
wb.save("p2.xlsx")
"""

# 11.写入公式
"""
sheet = wb.worksheets[3]
sheet["D1"] = "合计"
sheet["D2"] = "=B2*C2"
wb.save("p2.xlsx")
"""
"""
sheet = wb.worksheets[3]
sheet["D3"] = "=SUM(B3,C3)"
wb.save("p2.xlsx")
"""

# 12.删除
"""
# idx，要删除的索引位置
# amount，从索引位置开始要删除的个数（默认为1）
sheet.delete_rows(idx=1, amount=20)
sheet.delete_cols(idx=1, amount=3)
wb.save("p2.xlsx")
"""

# 13.插入
"""
sheet.insert_rows(idx=5, amount=10)
sheet.insert_cols(idx=3, amount=2)
wb.save("p2.xlsx")
"""

# 14.循环写内容
"""
sheet = wb["Sheet"]
cell_range = sheet['A1:C2']
for row in cell_range:
    for cell in row:
        cell.value = "xx"

for row in sheet.iter_rows(min_row=5, min_col=1, max_col=7, max_row=10):
    for cell in row:
        cell.value = "oo"
wb.save("p2.xlsx")
"""

# 15.移动
"""
# 将H2:J10范围的数据，向右移动15个位置、向上移动1个位置
sheet.move_range("H2:J10",rows=1, cols=15)
wb.save("p2.xlsx")
"""
"""
sheet = wb.worksheets[3]
sheet["D1"] = "合计"
sheet["D2"] = "=B2*C2"
sheet["D3"] = "=SUM(B3,C3)"
sheet.move_range("B1:D3",cols=10, translate=True) # 自动翻译公式
wb.save("p2.xlsx")
"""

# 16.打印区域
"""
sheet.print_area = "A1:D200"
wb.save("p2.xlsx")
"""

# 17.打印时，每个页面的固定表头
"""
sheet.print_title_cols = "A:D"
sheet.print_title_rows = "1:3"
wb.save("p2.xlsx")
"""

6.压缩文件

基于Python内置的shutil模块可以实现对压缩文件的操作。

import shutil

# 1. 压缩文件
"""
# base_name，压缩后的压缩包文件
# format，压缩的格式，例如："zip", "tar", "gztar", "bztar", or "xztar".
# root_dir，要压缩的文件夹路径
"""
# shutil.make_archive(base_name=r'datafile',format='zip',root_dir=r'files')


# 2. 解压文件
"""
# filename，要解压的压缩包文件
# extract_dir，解压的路径
# format，压缩文件格式
"""
# shutil.unpack_archive(filename=r'datafile.zip', extract_dir=r'xxxxxx/xo', format='zip')

7.路径相关

7.1 转义

windows路径使用的是\，linux路径使用的是/。

特别的，在windows系统中如果有这样的一个路径 D:\nxxx\txxx\x1，程序会报错。因为在路径中存在特殊符 \n（换行符）和\t（制表符），Python解释器无法自动区分。

所以，在windows中编写路径时，一般有两种方式：

加转义符，例如："D:\\nxxx\\txxx\\x1"
路径前加r，例如：r"D:\\nxxx\\txxx\\x1"

7.2 程序当前路径

项目中如果使用了相对路径，那么一定要注意当前所在的位置。

例如：在/Users/wupeiqi/PycharmProjects/CodeRepository/路径下编写 demo.py文件

with open("a1.txt", mode='w', encoding='utf-8') as f:
    f.write("你好呀")

用以下两种方式去运行：

方式1，文件会创建在 /Users/wupeiqi/PycharmProjects/CodeRepository/ 目录下。
```
cd /Users/wupeiqi/PycharmProjects/CodeRepository/
python demo.py
```

方式2，文件会创建在 /Users/wupeiqi目录下。

cd /Users/wupeiqi
python /Users/wupeiqi/PycharmProjects/CodeRepository/demo.py

import os

"""
# 1.获取当前运行的py脚本所在路径
abs = os.path.abspath(__file__)
print(abs) # /Users/wupeiqi/PycharmProjects/luffyCourse/day09/20.路径相关.py
path = os.path.dirname(abs)
print(path) # /Users/wupeiqi/PycharmProjects/luffyCourse/day09
"""
base_dir = os.path.dirname(os.path.abspath(__file__))
file_path = os.path.join(base_dir, 'files', 'info.txt')
print(file_path)
if os.path.exists(file_path):
    file_object = open(file_path, mode='r', encoding='utf-8')
    data = file_object.read()
    file_object.close()

    print(data)
else:
    print('文件路径不存在')

7.3 文件和路径相关

import shutil
import os

# 1. 获取当前脚本绝对路径
"""
abs_path = os.path.abspath(__file__)
print(abs_path)
"""

# 2. 获取当前文件的上级目录
"""
base_path = os.path.dirname( os.path.dirname(路径) ）
print(base_path)
"""

# 3. 路径拼接
"""
p1 = os.path.join(base_path, 'xx')
print(p1)

p2 = os.path.join(base_path, 'xx', 'oo', 'a1.png')
print(p2)
"""

# 4. 判断路径是否存在
"""
exists = os.path.exists(p1)
print(exists)
"""

# 5. 创建文件夹
"""
os.makedirs(路径)
"""
"""
path = os.path.join(base_path, 'xx', 'oo', 'uuuu')
if not os.path.exists(path):
    os.makedirs(path)
"""

# 6. 是否是文件夹
"""
file_path = os.path.join(base_path, 'xx', 'oo', 'uuuu.png')
is_dir = os.path.isdir(file_path)
print(is_dir) # False

folder_path = os.path.join(base_path, 'xx', 'oo', 'uuuu')
is_dir = os.path.isdir(folder_path)
print(is_dir) # True

"""

# 7. 删除文件或文件夹
"""
os.remove("文件路径")
"""
"""
path = os.path.join(base_path, 'xx')
shutil.rmtree(path)
"""

# 8. 拷贝文件夹
"""
shutil.copytree("/Users/wupeiqi/Desktop/图/csdn/","/Users/wupeiqi/PycharmProjects/CodeRepository/files")
"""

# 9.拷贝文件
"""
shutil.copy("/Users/wupeiqi/Desktop/图/csdn/WX20201123-112406@2x.png","/Users/wupeiqi/PycharmProjects/CodeRepository/")
shutil.copy("/Users/wupeiqi/Desktop/图/csdn/WX20201123-112406@2x.png","/Users/wupeiqi/PycharmProjects/CodeRepository/x.png")
"""

# 10.文件或文件夹重命名
"""
shutil.move("/Users/wupeiqi/PycharmProjects/CodeRepository/x.png","/Users/wupeiqi/PycharmProjects/CodeRepository/xxxx.png")
shutil.move("/Users/wupeiqi/PycharmProjects/CodeRepository/files","/Users/wupeiqi/PycharmProjects/CodeRepository/images")
"""

我是小青菜

关注

7
点赞
踩
0

收藏

觉得还不错? 一键收藏
1
评论
十、文件操作

文件操作文件夹和路径csv格式文件ini格式文件xml格式文件excel文件压缩文件1.文件操作字符串类型（str），在程序中用于表示文字信息，本质上是unicode编码中的二进制。name = "王昭君"字节类型（bytes）可表示文字信息，本质上是utf-8/gbk等编码的二进制（对unicode进行压缩，方便文件存储和网络传输。）Python open() 方法用于打开一个文件，并返回文件对象，在对文件进行处理过程都需要使用到这个函数，如果该文件无法被打开.
复制链接

扫一扫