我认为这是一个比你意识到的更深层次的问题。简单地将文件从Unicode转换为ASCII是很容易的,但是,将所有Unicode字符转换为合理的ASCII对应字符(许多字母在两种编码中都不可用)是另一种方法。
以下是网站上的有用引述:Python 1.6 also gets a "unicode"
built-in function, to which you can
specify the encoding:> >>> unicode('hello') u'hello'
> >>> unicode('hello', 'ascii') u'hello'
> >>> unicode('hello', 'iso-8859-1') u'hello'
> >>>All three of these return the same
thing, since the characters in 'Hello'
are common to all three encodings.
Now let's encode something with a
European accent, which is outside of
ASCII. What you see at a console may
depend on your operating system
locale; Windows lets me type in
ISO-Latin-1.> >>> a = unicode('André','latin-1')
> >>> a u'Andr\202'If you can't type an acute letter e,
you can enter the string 'Andr\202',
which is unambiguous.
Unicode supports all the common
operations such as iteration and
splitting. We won't run over them
here.