python3编码字符_python3中的字符编码转换

最新推荐文章于 2024-08-08 14:52:18 发布

weixin_39857876

最新推荐文章于 2024-08-08 14:52:18 发布

阅读量202

点赞数

文章标签： python3编码字符

我自己也有必要这么做，天真的做法是：def unzip(file, dir):

zips = zipfile.ZipFile(file)

for info in zips.infolist():

info.filename = info.filename.encode("cp437").decode("shift-jis")

print("Extracting: " + info.filename.encode(sys.stdout.encoding,errors='replace').decode(sys.stdout.encoding))

zips.extract(info,dir)

print("")

ZipFile似乎在内部将所有文件名视为DOS（代码页437）。与python2不同，python3在内部将所有字符串存储为某种UTF。因此，我们将文件名转换为字节数组，并将原始字节字符串解码为shift-JIS，以得到最终的文件名。在

print行也做了类似的事情，但是使用默认的stdout和back编码。这可以防止在Windows上发生错误，因为它的终端几乎从不支持Unicode。（如果是，则应正确显示名称。）

这对两个zip文件很有效，直到bam。。。在

^{pr2}$

奖金内容！它花了一些脑筋来解决这个问题，但问题是一些有效的shift-JIS字符包含反斜杠，ZipFile将其转换为正斜杠！例如，十在移位JIS中编码为8F 5C。这被转换为8F 2F，这是一个非法的序列。如果发生错误，下面的代码（可能过于复杂）将检查此情况，并尝试修复它。但可能还有其他字符会发生这种情况，而且序列是有效的，所以您得到的字符是错误的，而不是错误。：（def convert_filename(inname):

err_ctr=0

keep_going = True

trans_filename = bytearray(inname.encode("cp437"))

while keep_going:

keep_going = False

try:

outname = trans_filename.decode("shift-jis")

except UnicodeDecodeError as e:

keep_going = True

if e.args[4]=="illegal multibyte sequence":

p0, p1 = e.args[2], e.args[3]

print("Trying to fix encoding error at positions " + str(p0) +", "+ str(p1) + " caused by shift-jis sequence " + hex(trans_filename[p0]) +", "+ hex(trans_filename[p1]) )

if (trans_filename[p0]>127 and trans_filename[p1] == 0x2f):

trans_filename[p1] = 0x5c

else:

print("Don't know how to fix this error. Quitting. :(")

raise e

err_ctr = err_ctr + 1

print("This is error #" + str(err_ctr) + " for this filename.")

else:

raise e

if err_ctr>50:

print("More than 50 iterations. Are we stuck in an endless loop? Quitting...")

sys.exit(1)

return outname

def unzip(file, dir):

zips = zipfile.ZipFile(file)

for info in zips.infolist():

info.filename = convert_filename(info.filename)

print("Extracting: " + info.filename.encode(sys.stdout.encoding,errors='replace').decode(sys.stdout.encoding))

zips.extract(info,dir)

print("")

weixin_39857876

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

python3编码 字符_python3中的字符编码转换

python3编码字符_python3中的字符编码转换