python2文件编码

最新推荐文章于 2024-06-21 17:36:15 发布

hzhj

最新推荐文章于 2024-06-21 17:36:15 发布

阅读量424

点赞数

分类专栏： Python 文章标签： Python 编码中文乱码

本文链接：https://blog.csdn.net/hzhj2007/article/details/79203716

版权

Python 专栏收录该内容

11 篇文章 0 订阅

订阅专栏

python文件中使用的默认编码格式为ASCII ，这在实际使用中经常会出现许多问题，最常见的问题是不同编程语言间信息转换时常会由于编码使问题变得繁琐，中文输出乱码，并且该编码格式的输出内容可读行较差。所以为了省去编码导致的一些问题，通常将python文件格式设置为UTF-8类型。

设置编码的方法很简单，即在文件开始的第一行或者第二行，添加如下注释即可，"#"之后的空格可不要，加上显得美观。

# coding=utf-8

或者被流行编辑器识别的如下注释，"-*-"、”coding:“、”utf-8“前后的空格均可去除。

# -*- coding: utf-8 -*-

更一般的，设置方法只要符合如下正则表达式即可。正则表达式的理解可查看此处。

^[ \t\v]*#.*?coding[:=][ \t]*([-_.a-zA-Z0-9]+)

所以，常见的设置方法中，也会有如下表达。

# encoding: utf-8

上述方法在linux系统没有问题，可是有时候在windows上感觉没有成功，这是为什么呢，原因是尽管在代码中设置文件格式为“utf-8”类型，但是文件的实际存储编码并非该类型，而是使用的window系统中python环境默认的编码格式。此时可通过查看python默认编码格式，命令如下，或者使用Noetpad++查看文件编码格式，进行编码的查看。UnicodeDecodeError: 'ascii' codec can't decode byte 0xe6 in position 90: ordinal not in range(128)。reload(sys)后sys增加了setdefaultencoding函数。

Python 2.7.16 |Anaconda, Inc.| (default, Mar 14 2019, 15:42:17) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__name__', '__package__', '__stderr__', '__stdin__', '__stdout__', '_clear_type_cache', '_current_frames', '_getframe', '_git', 'api_version', 'argv', 'builtin_module_names', 'byteorder', 'call_tracing', 'callstats', 'copyright', 'displayhook', 'dllhandle', 'dont_write_bytecode', 'exc_clear', 'exc_info', 'exc_type', 'excepthook', 'exec_prefix', 'executable', 'exit', 'exitfunc', 'flags', 'float_info', 'float_repr_style', 'getcheckinterval', 'getdefaultencoding', 'getfilesystemencoding', 'getprofile', 'getrecursionlimit', 'getrefcount', 'getsizeof', 'gettrace', 'getwindowsversion', 'hexversion', 'long_info', 'maxint', 'maxsize', 'maxunicode', 'meta_path', 'modules', 'path', 'path_hooks', 'path_importer_cache', 'platform', 'prefix', 'ps1', 'ps2', 'py3kwarning', 'setcheckinterval', 'setprofile', 'setrecursionlimit', 'settrace', 'stderr', 'stdin', 'stdout', 'subversion', 'version', 'version_info', 'warnoptions', 'winver']
>>> reload(sys)
<module 'sys' (built-in)>
>>> dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__name__', '__package__', '__stderr__', '__stdin__', '__stdout__', '_clear_type_cache', '_current_frames', '_getframe', '_git', 'api_version', 'argv', 'builtin_module_names', 'byteorder', 'call_tracing', 'callstats', 'copyright', 'displayhook', 'dllhandle', 'dont_write_bytecode', 'exc_clear', 'exc_info', 'exc_type', 'excepthook', 'exec_prefix', 'executable', 'exit', 'exitfunc', 'flags', 'float_info', 'float_repr_style', 'getcheckinterval', 'getdefaultencoding', 'getfilesystemencoding', 'getprofile', 'getrecursionlimit', 'getrefcount', 'getsizeof', 'gettrace', 'getwindowsversion', 'hexversion', 'long_info', 'maxint', 'maxsize', 'maxunicode', 'meta_path', 'modules', 'path', 'path_hooks', 'path_importer_cache', 'platform', 'prefix', 'ps1', 'ps2', 'py3kwarning', 'setcheckinterval', 'setdefaultencoding', 'setprofile', 'setrecursionlimit', 'settrace', 'stderr', 'stdin', 'stdout', 'subversion', 'version', 'version_info', 'warnoptions', 'winver']
>>>

>>> import sys
>>> sys.getdefaultencoding()
'utf-8'

若默认编码不为'utf-8'，可采用如下几种方法解决。

1. 将文件编码通过Notepad++软件转换为"utf-8 无 BOM格式编码"。

2. 设置系统编码格式。

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('utf-8')
>>> sys.getdefaultencoding()
'utf-8'

注：修改系统编码后会导致其他问题，如使用jupyter环境时，print函数的输出打印到终端，而不是cell下方，因为reload时重置了输入和输出函数。其他问题可查看该文章和stackoverflow的文章。由于python3中默认编码为'utf-8'而非'ascii'，所以建议直接使用python3编译器，且官方已经宣布未来停止对python2.7的支持。

有次包含中文路径的文件读取有问题，后来利用下边代码解决了，先mark下。

filepath=unicode(filepath,'utf8')
fobj=open(filepath,"r")

hzhj

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫

专栏目录