解压缩zip, rar, tar等格式,会出现解压后文件名乱码问题。此外存在压缩文件未解压完成,程序运行结束,导致的解压不完全情况。针对上述问题,提出真实可用的解决方案。
解决的过程,阐述如下:
(1) 修改python库文件shutil.py
(2) 安装rarfile
(3) 代码示例
---------------------------------------------------------------------
(1) 关键修改shutil.py中_unpack_zipfile函数里的name = info.filename语句,修改如下:
def _unpack_zipfile(filename, extract_dir):
"""Unpack zip `filename` to `extract_dir`
"""
import zipfile # late import for breaking circular dependency
if not zipfile.is_zipfile(filename):
raise ReadError("%s is not a zip file" % filename)
zip = zipfile.ZipFile(filename)
try:
for info in zip.infolist():
# name = info.filename
if info.flag_bits & 0x800: # #utf-8 #编码
name = info.filename
else:
try:
# zipfile 默认使用 #cp437 编码 & #utf-8 编码
name = info.filename.encode('cp437').decode('utf-8')
except UnicodeDecodeError as e:
name = info.filename.encode('cp437').decode('gbk') # gbk编码兼容ASCII
# don't extract absolute paths or ones with .. in them
if name.startswith('/') or '..' in name:
continue
target = os.path.join(extract_dir, *name.split('/'))
if not target:
continue
_ensure_directory(target)
if not name.endswith('/'):
# file
data = zip.read(info.filename)
f = open(target, 'wb')
try:
f.write(data)
finally:
f.close()
del data
finally:
zip.close()
(2) shutil支持zip和tar类型压缩格式,为了支持rar类型的解压,需要安装rarfile
pip install rarfile
确保linux系统unrar命令可以正常运行,否则还需执行:
pip install unrar
(3) 示例程序如下:
def FindFile(path):
for ipath in os.listdir(path):
fulldir = os.path.join(path, ipath) # 拼接成绝对路径
# 解压文件
if '.zip' in fulldir:
try:
new_folder = os.path.join(path, ipath.split(".")[0])
pid = os.fork() # 创建解压子进程,解压结束才运行主进程遍历文件夹
if pid:
os.wait()
FindFile(new_folder)
else:
shutil.unpack_archive(fulldir, new_folder, 'zip')
except Exception as e:
pass
elif '.tar' in fulldir:
try:
new_folder = os.path.join(path, ipath.split(".")[0])
pid = os.fork()
if pid:
os.wait()
FindFile(new_folder)
else:
shutil.unpack_archive(fulldir, new_folder, 'tar')
except Exception as e:
pass
elif '.rar' in fulldir:
try:
new_folder = os.path.join(path, ipath.split(".")[0])
print(fulldir, new_folder)
# 压缩包路径
pid = os.fork()
if pid:
os.wait()
FindFile(new_folder)
else:
rar = rarfile.RarFile(fulldir)
# 解压缩到指定目录
rar.extractall(new_folder)
except Exception as e:
pass
else:
# print('fulldir', fulldir) # 打印相关后缀的文件路径及名称
pass
if os.path.isfile(fulldir) and '.pdf' in fulldir[-4:]: # 文件,匹配->打印
shutil.copy(fulldir, new_path)
if os.path.isdir(fulldir): # 目录,递归
FindFile(fulldir)
if not os.path.exists(new_path):
os.makedirs(new_path)
FindFile(old_path)