【python】File（12）

bryant_meng

已于 2024-09-11 19:47:48 修改

阅读量1.4k

点赞数 3

分类专栏： Python 文章标签： python linux 运维

于 2019-03-05 22:38:39 首次发布

本文链接：https://blog.csdn.net/bryant_meng/article/details/88204855

版权

Python 专栏收录该内容

113 篇文章 7 订阅

订阅专栏

在这里插入图片描述

参考学习

更多连载内容，请参考【Python】

最近一次修订时间为：2020-10-30

文章目录

1 绝对路径和相对路径
- 添加路径
- 获取当前路径
2 文件的编码
3 文件打开和关闭
- 文件属性
- 创建文件夹
- 删除文件夹
- 获取文件列表
4 文件写入与读取
5 pickle
- 压缩文件
- 解压文件
6 exel
- 6.1 txt 写入 exel
- 6.2 读 exel
7 os / shutil
- os.sep
- os.listdir()
- os.mkdir() and os.makedirs()
- os.path.basename()
- os.path.splitext()
- os.path.getsize()
- os.path.expanduser(path)
- os.path.isfile(path) / os.path.isdir(path)
- os.path.abspath(path)
- os.cpu_count()
- os.path.islink(path) / os.path.ismount(path)
8 文件搜索 glob.glob()
9 比较文本的差别
10 文件修改时间
11 yaml
12 md5sum
A.1 牛刀小试
- 删除指定目录下的所有指定文件
- 删除指定目录下的所有指定文件夹
- 获取指定目录下的所有文件名
- 对含有中文的 txt 进行读取
- heic to jpeg
A.2 Window+R 如何清空搜索历史记录
A.3 磁盘和文件系统

对文件的操作思路如下：

找出文件存放的路径，打开文件
对文件修改操作
关闭文件

1 绝对路径和相对路径

绝对路径：
E:/学习资料/文件系统.py 或者 E:\\学习资料\\文件系统.py（window system）
/home/document/file.py（linux）
相对路径
文件系统.py

结合绝对路径和相对路径时可以利用路径拼接函数

import
os.path.join(A,B,...)

会自己补充 “/”（for linux）或 “\\”（for window）

添加路径

添加路径的时候可以用如下的方式

import sys
sys.path.append(...)

获取当前路径

获取当前目录的时候可以用如下的方式

from pathlib import Path
print("当前工作目录:", Path.cwd())

import os
print(os.getcwd())

output

/home/data

或者

import os
path1 = os.path.abspath(__file__)
path2 = os.path.dirname(__file__) # 指的是，得到当前文件的绝对路径，是去掉脚本的文件名，只返回目录。
print(path1)
print(path2)

output

/home/data/2.py
/home/data

2 文件的编码

根据编码的不同，可以将文件分为文本字符和二进制字节

文本字符，如汉字、英文字母、数字、标点等，字符是为了显示（常用的编码有 ASCII 和 Unicode）
二进制字节是计算机存储的形式，在计算机中，任何数据都是01串构成的二进制字节

字符串等所有的文本字符使用的是 unicode 编码，可以使用encode（）进行编码，默认为utf-8，使用decode（）可以将文件解码为文本字符，默认解码 utf-8

在这里插入图片描述
encode

s = '哈哈哈哈，Enchanter is my girl friend'
s1 = s.encode()
s1

output 的输出为 byte 类型

b'\xe5\x93\x88\xe5\x93\x88\xe5\x93\x88\xe5\x93\x88\xef\xbc\x8cEnchanter is my girl friend'

decode

s2 = s1.decode()
s2

output

'哈哈哈哈，Enchanter is my girl friend'

当然可以有其它的编码模式，注意，用什么编码 encode，就需要相应的 decode

s = '哈哈哈哈，Enchanter is my girl friend'
s1 = s.encode('gbk')
s1

output

b'\xb9\xfe\xb9\xfe\xb9\xfe\xb9\xfe\xa3\xacEnchanter is my girl friend'

解码试试

s2 = s1.decode()
s2

会报错

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb9 in position 0: invalid start byte

用 gbk 来 decode

s2 = s1.decode('gbk')
s2

output

'哈哈哈哈，Enchanter is my girl friend'

正所谓，解铃还需系铃人，就是这个 feel

3 文件打开和关闭

文件的打开
文件对象 = open（文件名 [，模式 ]）
文件的关闭
文件对象.close( )

模式如下

w     以写方式打开，如果这个文件不存在，则创建这个文件

r      以只读方式打开

a     以写方式打开，写的内容追加在文章末尾（像列表的append（））

b     表示二进制文件

+     以修改方式打开，支持读/写

r+    以读写模式打开

w+   以读写模式打开 (参见 w )

a+    以读写模式打开 (参见 a )

rb     以二进制读模式打开

wb    以二进制写模式打开 (参见 w )

ab     以二进制追加模式打开 (参见 a )

rb+   以二进制读写模式打开 (参见 r+ )

wb+  以二进制读写模式打开 (参见 w+ )

ab+  以二进制读写模式打开 (参见 a+ )

怎么记住呢？

w = write 写

r = read 读

b = bytes 二进制

a = append 追加

如果不加模式，默认的是 r

f = open('C:/Users/Administrator/Desktop/Enchanter.txt')
f

output

<_io.TextIOWrapper name='C:/Users/Administrator/Desktop/Enchanter.txt' mode='r' encoding='cp936'>

关闭文件如下

f.close()

如果用 r 模式打开一个不存在的文件

f = open('C:/Users/Administrator/Desktop/Bryant.txt')
f

会报错

FileNotFoundError: [Errno 2] No such file or directory: 'C:/Users/Administrator/Desktop/Bryant.txt'

换成 w 模式

f = open('C:/Users/Administrator/Desktop/Bryant.txt','w')
f

output 为

<_io.TextIOWrapper name='C:/Users/Administrator/Desktop/Bryant.txt' mode='w' encoding='cp936'>

在相对应的路径下会生成 Bryant.txt 文件

在这里插入图片描述

补充：

window 系统下的中文

在这里插入图片描述

ubuntu 系统下直接用 open() 打开
在这里插入图片描述

遍历每行获取数据，会报错

f = open("./1.txt")

for line in f.readlines():
    print(line.strip())

output

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd1 in position 0: invalid continuation byte

此时读取需要用以下方式打开文件

f = open("./1.txt", encoding="gbk")

for line in f.readlines():
    print(line.strip())

output

严
123
456
abcde

参考 python读文件出现中文乱码

文件属性

在这里插入图片描述

创建文件夹

from pathlib import Path
Path("test_folder").mkdir(parents=True, exist_ok=True)

删除文件夹

删除文件夹 os.rmdir()

如果使用 pathlib 模块,可以使用 unlink() 方法（文件），而删除目录可以使用 rmdir() 方法。

获取文件列表

import os
os.listdir()

或者

from pathlib import Path
list(Path('.').glob("*.txt"))

另外，直接使用 glob 模块也很方便

from glob import glob
files = list(glob('h*'))

4 文件写入与读取

设置 C:/Users/Administrator/Desktop/ 为当前的路径，这样设置后，就可以用相对路径来打开第三节的文件了！

import os 
os.chdir('C:/Users/Administrator/Desktop/')

手动在 Enchanter.txt 文件中写入如下内容
在这里插入图片描述
用读文件的方法，打开看看

f = open('Enchanter.txt')
f.readlines()

output

["Wind's in the east\n",
 'mist coming in\n',
 'like something is brewing\n',
 'about to begin\n',
 "Can't put my finger\n",
 'on what lies in store\n',
 "but I feel what's to happen\n",
 'all happened before']

其中 \n 表示换行，可以用 f.read() 读，readlines 表示读所有行，readline表示逐行逐行的读，我们用代码写入一句 yours Enchanter

import os 
os.chdir('C:/Users/Administrator/Desktop/')
f = open('Enchanter.txt','a')
f.write('\nyours Enchanter')
f.close()

再打开文件看看

在这里插入图片描述
读文件试试

f = open('Enchanter.txt')
f.readlines()

output

f = open('Enchanter.txt')
f.readlines()
f = open('Enchanter.txt')
f.readlines()
Out[45]:
["Wind's in the east\n",
 'mist coming in\n',
 'like something is brewing\n',
 'about to begin\n',
 "Can't put my finger\n",
 'on what lies in store\n',
 "but I feel what's to happen\n",
 'all happened before\n',
 'yours Enchanter']

注意打开文件后，read 方法相当于读取全部内容，如果读完内容，再次读取时候，书签已经在文章末尾，再次往后读当然没有内容啦~

例如，在 Bryant.txt 中有如下内容

在这里插入图片描述

import os 
os.chdir('C:/Users/Administrator/Desktop/')
f = open('Bryant.txt')
f.read()

output

'my name is Bryant!\nmy girl friend is Enchanter!'

再读一下

f.read()

output

''

可以用 readline() 一行一行的读，也要注意上述的情况

在这里插入图片描述
readlines() 也一样

在这里插入图片描述

import os 
os.chdir('C:/Users/Administrator/Desktop/')
f = open('Bryant.txt')
for i in f.readlines():
    print(i)

output

my name is Bryant!

my girl friend is Enchanter!

5 pickle

pickle 是一个 python 中, 压缩/保存/提取文件的模块

1）压缩保存

import pickle

dict_1 = {'my': 'bryant', 'you': 'enchanter', 'together': 'hahaha'}

# pickle a variable to a file
file = open('demo.pickle', 'wb')
pickle.dump(dict_1, file)
file.close()

用写字板打开是乱码的
在这里插入图片描述
2）提取

with open('demo.pickle', 'rb') as file:
    dict2 =pickle.load(file)
print(dict2)

output

{'my': 'bryant', 'you': 'enchanter', 'together': 'hahaha'}

补充

压缩文件

from zipfile import ZipFile

# 创建压缩文件
with ZipFile('text_files.zip', 'w') as file:
	for txt_file in Path().glob('*.txt'):
		print(f"*添加文件: {txt_file.name} 到压缩文件")
		file.write(txt_file)

*添加文件: hello3.txt 到压缩文件
*添加文件: hello2.txt 到压缩文件

解压文件

# 解压缩文件
with ZipFile('text_files.zip') as zip_file:
	zip_file.printdir()
	zip_file.extractall()
... 
File Name                                             Modified             Size
hello3.txt                                     2020-07-30 20:29:50           51
hello2.txt                                     2020-07-30 18:29:52           26

6 exel

6.1 txt 写入 exel

参考用python读写excel（xlrd、xlwt）

import xlwt
workbook = xlwt.Workbook(encoding = 'ascii')
worksheet = workbook.add_sheet('My Worksheet')
worksheet.write(0,0, label = 'bryant')
workbook.save('Excel_Workbook.xls')

上面的代码就是在 exel 的 0 行 0列，写入了 bryant

在这里插入图片描述

下面是读 txt 的文件，内容如下所示，我们按照 20，21，22……50 换列，每列 9个元素。具体代码如下
在这里插入图片描述

思路，先存在一个列表中，然后按行列写进去

import os
file=open('test.txt')    
data = []
for line in file.readlines():
    curLine=line.strip().split(" ")  
    data.append(curLine[0])
    
import xlwt
workbook = xlwt.Workbook(encoding = 'ascii')
worksheet = workbook.add_sheet('My Worksheet')
num = 0
for i in range(30):
    for j in range(9):
        worksheet.write(j,i, label = data[num])
        num+=1
workbook.save('Excel_Workbook.xls')

结果如下
在这里插入图片描述

6.2 读 exel

来自 python3解析excel文件

#coding=utf-8
import xlrd
'''
读取Excel每个sheet的第一列和第二列的值,拼接成json串,写入文件

'''
def resolveExcel():
    # 获取excel文件
    data = xlrd.open_workbook("/you/excel/location/?.xlsx",encoding_override='utf-8')
    #获取一个excel有多少个sheet
    sheetNames = list(data.sheet_names())
    print(sheetNames)
    #写入目标文件位置
    with open('/aim/file/location/?.txt', "r+") as f:
        read_data = f.read()
        f.seek(0)
        f.truncate()   #清空文件
   #遍历sheet
    for name in sheetNames:
        # 获取sheet
        sheet = data.sheet_by_name(name)
        # 获取总行数
        nrows = sheet.nrows
        print(nrows)
        # 获取总列数
        ncols = sheet.ncols
        print(ncols)
        # 获取一行的数值
        #table.row_values(i)
        # 获取一列的数值
        key = sheet.col_values(0)
        chinese = sheet.col_values(1)

        #获取具体单元格的值
        # cell_value = table.cell(0,1).value
        # print(cell_value)
    
        #获取一个单元格的数值
        count = 1
        chineseStr = ""

        while count <= nrows - 1:
            chineseString = "\""+key[count]+"\" = " + "\""+chinese[count] +"\""+ ";\n"
            chineseStr = chineseStr + chineseString
            count = count + 1

        chinestfile = open('/aim/file/location/?.txt', 'a+', encoding='utf-8')
        chinestfile.write(chineseStr)

if __name__ == '__main__':
    resolveExcel()

7 os / shutil

Python OS 文件/目录方法

Python os.path() 模块

os 和 shutil 的应用会使得文件操作更加灵活

os.path.exsists(……) 判断文件文件夹是否存在，

参考：https://docs.python.org/3/library/os.path.html

配合 os.makedirs(...) 创建文件夹很酸爽，eg

if not os.path.exists(path):
	os.makedirs(path)

pathlib 模块中 exists() 用法

from pathlib import Path
Path('directory_path').exists()

os.rename(A,B) 可以把文件 A 的名字改为文件 B

import shutil 后 shutil.move(source,target) 和 shutil.copy(source,target) 可以实现文件的移动和复制

os.sep

分隔符，在 Windows 上，文件的路径分隔符是 \，在 Linux 上是 /。

os.sep

output

“/”

os.listdir()

获取路径下所有文件和文件夹的名字，返回 type 为 list

在这里插入图片描述
来自 Python os.listdir() 方法

import os
files = os.listdir(path)
for file in files:
	pass

os.mkdir() and os.makedirs()

比如要创建 “A/B/C/D/E”

os.mkdir()创建路径中的最后一级目录，即：只创建 E 目录，而如果之前的目录不存在并且也需要创建的话，就会报错。如果 E 已存在也会报错

os.makedirs()创建多层目录，即：A B C D E 如果都不存在的话，会自动创建

os.makedirs() 中有个参数为 exist_ok，默认为 False，当目标目录（即要创建的目录）已经存在，会抛出一个 FileExistsError，如果设置为 True，则不会报异常，也不会覆盖
在这里插入图片描述

参考

os.mkdir()与os.makedirs()的使用方法

Python中os.mkdir()与os.makedirs()的区别及用法

os.path.basename()

返回 path 最后的文件名

如果path以／或 \ 结尾，那么就会返回空值。

import os

print(os.path.basename("/home/bryant/1.txt"))
print(os.path.basename("/home/bryant/1.txt/"))

output

1.txt

os.path.splitext()

分割路径，返回路径名和文件扩展名的元组

0 表示文件名，1 表示后缀

os.path.getsize()

返回的是 int，单位是 Byte，/1024 即可统计 KB，/1024/1024 即可统计 MB

os.path.expanduser(path)

os.path.expanduser() method in Python is used to expand an initial path component ~( tilde symbol) or ~user in the given path to user’s home directory.

在这里插入图片描述

os.path.isfile(path) / os.path.isdir(path)

os.path.isfile(path) ：判断路径是否为文件
os.path.isdir(path)：判断路径是否为目录，也即是否是文件夹

from pathlib import Path
# 是否为目录
os.path.isdir('需要检查的路径')
Path('需要检查的路径').is_dir()

# 检查路径是否是文件
os.path.isfile('需要检查的路径')
Path('需要检查的路径').is_file()

os.path.abspath(path)

返回文件的绝对路径

os.path.abspath("1.py")

os.cpu_count()

获取系统 cpu 数量

import os
print(os.cpu_count())

os.path.islink(path) / os.path.ismount(path)

os.path.islink(path) #判断路径是否为链接
os.path.ismount(path) #判断路径是否为挂载点

8 文件搜索 glob.glob()

glob.glob （必须在参数里写上指定的路径。可以是相对路径也可以是绝对路径。）
函数功能：匹配所有的符合条件的文件，并将其以list的形式返回。跟使用 windows 下的文件搜索差不多。

”*” 匹配 0 个或多个字符；
”?” 匹配单个字符；
”[]” 匹配指定范围内的字符，如：[0-9] 匹配数字。

import glob
print(glob.glob("/home/*/*.jpg"))

9 比较文本的差别

来自 2行Python就能实现 “文本文件” 差异比较，太强了！

需要安装一个 filestools 库，介绍如下 https://pypi.org/project/filestools/

注意是

pip install filestools

不是

pip install filetools

哈哈哈

下面我们利用 filestools 来对比下 txt 文本的差别，新建 1.txt 和 2.txt 内容分别如下

在这里插入图片描述

两行比对差异

from filediff.diff import file_diff_compare
file_diff_compare('1.txt', '2.txt', show_all=True, no_browser=True)

在这里插入图片描述

在这里插入图片描述
差异结果

filestools 的更多功能可以参考访问 https://pypi.org/project/filestools/

补充：其实有些软件比较功能也是比较强大的，比如 Beyond Compare

10 文件修改时间

用到了 time 模块

import os
import time

path = u"D:\VPN\代码\数据仓库存储过程修改备份"
for root, dir, files in os.walk(path):
    for file in files:
        full_path = os.path.join(root, file)
        mtime = os.stat(full_path).st_mtime
        file_modify_time = time.strftime('%Y-%m-%d %H:%M:%S', time.localtime(mtime))
        print("{0} 修改时间是: {1}".format(full_path, file_modify_time))

python 遍历文件并获得文件的修改时间

11 yaml

yaml.load与yaml.dump的用法

import yaml
#向yaml文件中写
with open("E:\ rename.yaml", 'w') as f:
    project = {'Kobe':"Bryant", "Simon":"Tom","Alex":'Bob'}
    yaml.dump(project,f)
#读取yaml文件中的内容
with open("E:\ rename.yaml") as ff:
    temp = yaml.load(ff.read())
for key,values in temp.items():
    print(key," ",values)

output

Alex   Bob
Kobe   Bryant
Simon   Tom

12 md5sum

终端 md5sum xxx 即可获取

python 代码如下

import hashlib
with open("1.docx", "rb") as f:
    data = f.read()
    print(hashlib.md5(data).hexdigest())

A.1 牛刀小试

删除指定目录下的所有指定文件

代码参考 https://blog.csdn.net/lishenluo/article/details/79244210

#coding=utf-8
import os

def readFilename(file_dir):
    for root, dirs, files in os.walk(file_dir): 
        return files,dirs,root
def deleteFilesEndWithPYC(files,dirs,root):
    for ii in files:
        if ii.endswith('val_predict.ipynb'): # 要删除的文件
            print ('delete:',ii)
            os.remove(os.path.join(root,ii))
    for jj in dirs:
        fi,di,ro = readFilename(root+"//"+jj)
        deleteFilesEndWithPYC(fi,di,ro)

files,dirs,root = readFilename(u"D://urynet_4.0/") # 指定目录
deleteFilesEndWithPYC(files,dirs,root)

删除指定目录下的所有指定文件夹

代码参考 https://www.jb51.net/article/168933.htm

import os,shutil
import sys
import numpy as np

##########批量删除不同文件夹下的同名文件夹#############
def arrange_file(dir_path0):
    for dirpath,dirnames,filenames in os.walk(dir_path0):
        if 'inspect_data.ipynb' in dirpath: # 要删除的文件夹名字
            print(dirpath)
            shutil.rmtree(dirpath)
arrange_file(u"D://urynet_4.0/") # 指定目录

获取指定目录下的所有文件名

参考：https://www.cnblogs.com/bigtreei/p/9316369.html

import os
def file_name(file_dir):
    for root, dirs, files in os.walk(file_dir):
        print('root_dir:', root)  # 当前目录路径
        print('sub_dirs:', dirs)  # 当前路径下所有子目录
        print('files:', files)  # 当前路径下所有非目录子文件
file_name('D://test')

注意，for 只会运行一次，files 是 list 类型，里面的元素是 str 格式

对含有中文的 txt 进行读取

【Python】对含有中文的txt进行读取

with open("test.txt", 'r', encoding='utf8') as f:
    lines = f.readlines()

写入的话可以不加 encoding='utf8'

heic to jpeg

安装库

pip install whatimage
pip install pyheif

实现 heic 格式批量转化为 jpeg 格式

import pyheif
import whatimage
import traceback
from PIL import Image
import os
from tqdm import tqdm


def decodeImage(src_path, tgt_path):
    for img in tqdm(os.listdir(src_path)):
        file_path = os.path.join(src_path, img)

        with open(file_path, "rb") as f:
            bytesIo = f.read()

        try:
            fmt = whatimage.identify_image(bytesIo)
            if fmt in ['heic']:
                i = pyheif.read_heif(bytesIo)
                pi = Image.frombytes(mode=i.mode, size=i.size, data=i.data)
                pi.save(os.path.join(tgt_path, img.replace(".heic", ".jpg")), format="jpeg")
        except:
            traceback.print_exc()


if __name__ == "__main__":
    src_path = "../heic/"
    tgt_path = "../heic-output/"
    data = decodeImage(src_path, tgt_path)