python日语文件名_Python os.walk和日文文件名崩溃

1586010002-jmsa.png

I have a folder with a filename "01 - ナナナン塊.txt"

I open python at the interactive prompt in the same folder as the file and attempt to walk the folder hierachy:

Python 3.1.2 (r312:79149, Mar 21 2010, 00:41:52) [MSC v.1500 32 bit (Intel)] on win32

Type "help", "copyright", "credits" or "license" for more information.

>>> import os

>>> for x in os.walk('.'):

... print(x)

...

Traceback (most recent call last):

File "", line 2, in

File "C:\dev\Python31\lib\encodings\cp850.py", line 19, in encode

return codecs.charmap_encode(input,self.errors,encoding_map)[0]

UnicodeEncodeError: 'charmap' codec can't encode characters in position 17-21: character maps to

Clearly the encoding I'm using isn't able to deal with Japanese characters. Fine. But Python 3.1 is meant to be unicode all the way down, as I understand it, so I'm at a loss as to what I'm meant to do with this. Anyone have any ideas?

解决方案

It seems like all answers so far are from Unix people who assume the Windows console is like a Unix terminal, which it is not.

The problem is that you can't write Unicode output to the Windows console using the normal underlying file I/O functions. The Windows API WriteConsole needs to be used. Python should probably be doing this transparently, but it isn't.

There's a different problem if you redirect the output to a file: Windows text files are historically in the ANSI codepage, not Unicode. You can fairly safely write UTF-8 to text files in Windows these days, but Python doesn't do that by default.

I think it should do these things, but here's some code to make it happen. You don't have to worry about the details if you don't want to; just call ConsoleFile.wrap_standard_handles(). You do need PyWin installed to get access to the necessary APIs.

import os, sys, io, win32api, win32console, pywintypes

def change_file_encoding(f, encoding):

"""

TextIOWrapper is missing a way to change the file encoding, so we have to

do it by creating a new one.

"""

errors = f.errors

line_buffering = f.line_buffering

# f.newlines is not the same as the newline parameter to TextIOWrapper.

# newlines = f.newlines

buf = f.detach()

# TextIOWrapper defaults newline to \r\n on Windows, even though the underlying

# file object is already doing that for us. We need to explicitly say "\n" to

# make sure we don't output \r\r\n; this is the same as the internal function

# create_stdio.

return io.TextIOWrapper(buf, encoding, errors, "\n", line_buffering)

class ConsoleFile:

class FileNotConsole(Exception): pass

def __init__(self, handle):

handle = win32api.GetStdHandle(handle)

self.screen = win32console.PyConsoleScreenBufferType(handle)

try:

self.screen.GetConsoleMode()

except pywintypes.error as e:

raise ConsoleFile.FileNotConsole

def write(self, s):

self.screen.WriteConsole(s)

def close(self): pass

def flush(self): pass

def isatty(self): return True

@staticmethod

def wrap_standard_handles():

sys.stdout.flush()

try:

# There seems to be no binding for _get_osfhandle.

sys.stdout = ConsoleFile(win32api.STD_OUTPUT_HANDLE)

except ConsoleFile.FileNotConsole:

sys.stdout = change_file_encoding(sys.stdout, "utf-8")

sys.stderr.flush()

try:

sys.stderr = ConsoleFile(win32api.STD_ERROR_HANDLE)

except ConsoleFile.FileNotConsole:

sys.stderr = change_file_encoding(sys.stderr, "utf-8")

ConsoleFile.wrap_standard_handles()

print("English 漢字 Кири́ллица")

This is a little tricky: if stdout or stderr is the console, we need to output with WriteConsole; but if it's not (eg. foo.py > file), that's not going to work, and we need to change the file's encoding to UTF-8 instead.

The opposite in either case will not work. You can't output to a regular file with WriteConsole (it's not actually a byte API, but a UTF-16 one; PyWin hides this detail), and you can't write UTF-8 to a Windows console.

Also, it really should be using _get_osfhandle to get the handle to stdout and stderr, rather than assuming they're assigned to the standard handles, but that API doesn't seem to have any PyWin binding.

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值