Python 中使用 zipfile 以及中文乱码问题

最新推荐文章于 2024-04-20 07:59:02 发布

紫色蜘蛛爬啊爬

最新推荐文章于 2024-04-20 07:59:02 发布

阅读量4.9k

点赞数 2

分类专栏： Python

本文链接：https://blog.csdn.net/zzphapy/article/details/81703539

版权

Python 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

上一篇使用7zip压缩备份文件之后，我用zipfile 查看文件信息时，因为含有中文文件名，出现了乱码。花了很久检查这个问题。

写在前面：使用7zip压缩的时候记得 -mcu，指定使用utf-8编码文件名，后面就没这么多P事。

zip_command = '"D:\\Program Files (x86)\\7-Zip\\7z.exe" a -tzip -mcu {0} {1} '.format(target, ' '.join(source))

思路大概是：

1. 编码问题 -> Python 默认unicode 编码问题还是 windos编码问题？还是PyCharm编码问题？

2. CMD命令行中是否有这个问题？

3. 查看各个编码（sys.get..., local get...）

4. 尝试在 i.filename 后面各种 encode / decode；包括gbk，utf-8/-16，big5, 2312；在PyCharm和CMD里都尝试了；

5. 尝试无果，最终在下面文章里找到一句话“WinZip将所有文件名解释为在CP437中编码，也称为DOS Latin”。使用cp437 解决问题。参考文档：

https://blog.csdn.net/xinxinNoGiveUp/article/details/80342044

https://blog.csdn.net/mp9105/article/details/80288549

print('\nfilename', i.filename.encode('cp437').decode('gbk'))

6. 注意：使用 zipfile.open+zipfile.read 读取压缩文件时，指定文件使用“/”分隔路径，不需要使用双反斜杠“\\”

f.open('fold/New Text Document 文档.txt').read().decode('gbk')

使用-mcu压缩文件以后，中文文件名不再乱码，下面代码中以及注释掉encode('cp437').decode('gbk'))。

import sys
import locale
import zipfile
# sys.setdefaultencoding('utf8')
print(sys.getdefaultencoding())
print(sys.getfilesystemencoding())
print(sys.stdout.encoding)
print(sys.stdout.isatty())
print(locale.getpreferredencoding())


t = 'D:\\test\\Backup\\20180815140640.zip'
print(zipfile.is_zipfile(t))
f = zipfile.ZipFile(t)
print('filename is', f.filename)
print('namelist is', f.namelist)
# 如果压缩时没有使用utf-8 编码，WinZip将所有文件名解释为在CP437中编码，也称为DOS Latin
# zipfile 函数中文乱码情况，需要encode('cp437') 再 decode('gbk')
# print('first item is', f.namelist()[0].encode('cp437').decode('gbk'))
# print('second item is', f.namelist()[1].encode('cp437').decode('gbk'))

print('first item is', f.namelist()[0])
print('second item is', f.namelist()[1])

print('infolist is', f.infolist())


for i in f.infolist():
    # print('\nfilename', i.filename.encode('cp437').decode('gbk'))
    print('\nfilename', i.filename)
    print('\tComment:', i.comment)
    print('\tModified:', i.date_time)
    print('\tSystem:', i.create_system, '(0 = Windows, 3 = Unix)')
    print('\tCompressed:', i.compress_size, 'bytes')
    print('\tUncompressed:', i.file_size, 'bytes')
    print('\textra:', i.extra)
    print('\tcreate_system:', i.create_system)
    print('\tcreate_version:', i.create_version)
    print('\textract_version:', i.extract_version)
    print('\textract_version:', i.reserved)
    print('\tflag_bits:', i.flag_bits)
    print('\tvolume:', i.volume)
    print('\tinternal_attr:', i.internal_attr)
    print('\texternal_attr:', i.external_attr)
    print('\theader_offset:', i.header_offset)
    print('\tCRC:', i.CRC)




# 压缩文件里fold 文件夹下的 New Text Document 文档.txt
print('\nTxt content is:', f.open('fold/New Text Document 文档.txt').read().decode('gbk'))
print('\nTxt content is:', f.open(f.namelist()[3]).read().decode('gbk'))


# 解压缩全部项目
f.extractall('D:\\test\\extract')
# 解压第二个项目，第一个为目录fold
f.extract(f.namelist()[1], 'D:\\test\\extract')


f.close()

付：

# ZipFile.getinfo(name) 方法返回的是一个ZipInfo对象，表示zip文档中相应文件的信息。它支持如下属性：

# ZipInfo.filename：获取文件名称。
# ZipInfo.date_time：获取文件最后修改时间。返回一个包含6个元素的元组：(年, 月, 日, 时, 分, 秒)
# ZipInfo.compress_type：压缩类型。
# ZipInfo.comment：文档说明。
# ZipInfo.extr：扩展项数据。
# ZipInfo.create_system：获取创建该zip文档的系统。
# ZipInfo.create_version：获取创建zip文档的PKZIP版本。
# ZipInfo.extract_version：获取解压zip文档所需的PKZIP版本。
# ZipInfo.reserved：预留字段，当前实现总是返回0。
# ZipInfo.flag_bits： zip标志位。
# ZipInfo.volume：文件头的卷标。
# ZipInfo.internal_attr：内部属性。
# ZipInfo.external_attr：外部属性。
# ZipInfo.header_offset：文件头偏移位。
# ZipInfo.CRC：未压缩文件的CRC-32。
# ZipInfo.compress_size：获取压缩后的大小。
# ZipInfo.file_size：获取未压缩的文件大小。

紫色蜘蛛爬啊爬

关注

2
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Python 中使用 zipfile 以及中文乱码问题

上一篇使用7zip压缩备份文件之后，我用zipfile 查看文件信息时，因为含有中文文件名，出现了乱码。花了很久检查这个问题。写在前面：使用7zip压缩的时候记得 -mcu，指定使用utf-8编码文件名，后面就没这么多P事。zip_command = '"D:\\Program Files (x86)\\7-Zip\\7z.exe" a -tzip -mcu {0} {1} '.f...
复制链接

扫一扫