[python]编码总结

最新推荐文章于 2024-04-20 11:33:37 发布

小狼女做笔记

最新推荐文章于 2024-04-20 11:33:37 发布

阅读量95

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/lyjlyj3277/article/details/114534843

版权

python 专栏收录该内容

7 篇文章 0 订阅

订阅专栏

1. 类型区别

1.1 Python2
Unicode字符串 ==> unicode 类型
非Unicode字符串 ==> str 类型
1.2 Python3
Unicode字符串 ==> str 类型
非Unicode字符串 ==> bytes 类型

2. 编码转换

任何语言、任何平台、任何编码的字符串，都可以和Unicode互相转换

    utf8_str = unicode_str.encode("utf-8")
    gbk_str = unicode_str.encode("gbk")

    unicode_str = utf8_str.decode("utf-8")
    unicode_str = gbk_str.decode("gbk")

总结：unicode可编码成其它编码，其它编码可解码成unicode

3. 终端编码

应用场景：在解释器终端创建的字符串
区别：
Python2：Linux系统为utf-8、Windows系统为gbk
Python3：都是unicode，因此可变为其它编码

4. 文件编码

4.1 问题抛出
写入字符串到文件中，文件创建成功，则文件编码等同于写入的字符串编码。如果写入了其他编码的字
符串，则文件编码被修改，原来的内容会变成"乱码"
4.2 数据如何写入文件
W为写入字符串，wb为写入非字符串
4.2.1 手动转码
Python2：字符串为非unicode

with open("xxx.txt", "w") as f:
      f.write(unicode_str.encode("utf-8"))

with open("xxx.txt", "wb") as f:
      f.write(unicode_str)

Python3：字符串为unicode

with open("xxx.txt", "w") as f:
    f.write(unicode_str)

with open("xxx.txt", "wb") as f:
    f.write(unicode_str.encode("utf-8"))

4.2.2 通过open()方法的encoding 参数
Python2：

import codecs
with codecs.open("xxx.txt", "w", encoding="utf-8") as f:
    f.write("中文")

Python3：

 with open("xxx.txt", "w", encoding="utf-8") as f:
     f.write(unicode_str)

    注意：Python2写中文会报错: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in                                position 0:ordinal not in range(128)

原因：Python2按照ascii编码，ascii不支持中文
解决：修改Python2的解释器编码为utf-8

 import sys
 reload(sys)
 sys.setdefaultencoding("utf-8")

5. json文件编码（爬虫）

问题：如何写入中文至json文件，且又能显示成中文
解决：统一写入utf-8数据
Python2：Python2写入utf-8，要么手动转码、要么修改解释器编码

  import sys
  import json
  reload(sys)
  sys.setdefaultencoding('utf-8')

  unicode_str = u'\u6df1\u5733'
  json_str=json.dumps(unicode_str,ensure_ascii=False)

  with open("json_file.json","w") as f:
      f.write(json_str)

Python3： Python3的默认解释器编码为utf-8

  s = {"name": "中国"}
  json_str = json.dumps(s, ensure_ascii=False)
  with open("json_file.json", "w") as f:
      f.write(json_str)

说明：

ensure_ascii=False表示不按照Python解释器编码
Python3中打开json_file.json乱码，原因是windows默认编码为gbk，重载编辑器显示gbk即可
爬虫存储数据建议存储为unicode数据，方便取数据转码

6. urlencode编解码

Python2:

  import urllib
  query_str = urllib.urlencode({"wd": "你好"})  # 编码
  query_obj = urllib.unquote(query_str)  # 解码

Python3:

   from urllib import parse
   query_str = parse.urlencode({"wd": "你好"})  # 编码
   query_obj = parse.unquote(query_str)  # 解码

7. 类unicode转unicode

   s = '\u234\u435\u8348'
   u_str = s.decode("unicode-escape")
   print(u_str)  # u'\u234\u435\u8348'

小狼女做笔记

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[python]编码总结

1. 类型区别1.1 Python2 Unicode字符串 ==> unicode 类型非Unicode字符串 ==> str 类型1.2 Python3 Unicode字符串 ==> str 类型非Unicode字符串 ==> bytes 类型2. 编码转换任何语言、任何平台、任何编码的字符串，都可以和Unicode互相转换 utf8_str = unicode_str.encode("utf-8") gbk_str = unicod
复制链接

扫一扫

专栏目录