[155]python之分析decode、encode、unicode编码转换

最新推荐文章于 2025-04-01 13:00:00 发布

周小董

最新推荐文章于 2025-04-01 13:00:00 发布

阅读量5.1k

点赞数

分类专栏： Python前行者文章标签： python

本文链接：https://blog.csdn.net/xc_zhou/article/details/80810024

版权

Python前行者专栏收录该内容

339 篇文章

订阅专栏

文章目录

decode()方法使用注册编码的编解码器的字符串进行解码。它默认为默认的字符串编码。
decode函数可以将一个普通字符串转换为unicode对象。
decode是将普通字符串按照参数中的编码格式进行解析，然后生成对应的unicode对象。
比如：在这里我们代码用的是utf-8，那么把一个字符串转换为unicode就是如下形式：

s2='哈'.decode('utf-8')

s2就是一个存储了'哈'字的unicode对象，其实就和unicode('哈','utf-8')以及u'哈'是相同的。

str.decode(encoding=‘UTF-8’,errors=‘strict’)

decode参数

encoding：这是所使用的编码。
errors：这可能是给定一个不同的错误处理机制。默认的错误是“严格”，即编码错误提出UnicodeError。其他可能的值是’ignore’, ‘replace’, ‘xmlcharrefreplace’, ‘backslashreplace’ 并通过codecs.register_error()。注册的任何其他名称。

encode()方法正好就是相反的功能，是将一个unicode对象转换为参数中编码格式的普通字符，encode正好就是相反的功能，是将一个unicode对象转换为参数中编码格式的普通字符。

str.encode(encoding=‘utf-8’,errors=‘strict’)

encode参数参数

encoding：这是所使用的编码。对于所有的编码方案的列表，请访问：标准编码库。
errors：这可能是给定一个不同的错误处理机制。默认的错误是“严格”，即编码错误提出UnicodeError。其他可能的值是’ignore’, ‘replace’, ‘xmlcharrefreplace’, ‘backslashreplace’ 并通过codecs.register_error()。注册的任何其他名称。

将unicode编码转换为汉字,前边带u的

#将unicode编码转换为汉字,前边带u的
---------------------------------python2运行结果----------------------------------------------------
>>> str = u'\u4eac\u4e1c\u653e\u517b\u7684\u722c\u866b'
>>> print(str.encode('utf-8'))
京东放养的爬虫
>>> print(str.encode('utf-8'),type(str))
('\xe4\xba\xac\xe4\xb8\x9c\xe6\x94\xbe\xe5\x85\xbb\xe7\x9a\x84\xe7\x88\xac\xe8\x99\xab', <type 'unicode'>)
>>>
>>> str2='京东放养的爬虫'
>>> uu=str2.decode('utf-8')
>>> print(uu)
京东放养的爬虫
>>> print(uu,type(uu))
(u'\u4eac\u4e1c\u653e\u517b\u7684\u722c\u866b', <type 'unicode'>)
>>> 

---------------------------------python3运行结果----------------------------------------------------
>>> str = u'\u4eac\u4e1c\u653e\u517b\u7684\u722c\u866b'
>>> print(str.encode('utf-8'))
b'\xe4\xba\xac\xe4\xb8\x9c\xe6\x94\xbe\xe5\x85\xbb\xe7\x9a\x84\xe7\x88\xac\xe8\x99\xab'
>>> print(str.encode('utf-8'),type(str))
b'\xe4\xba\xac\xe4\xb8\x9c\xe6\x94\xbe\xe5\x85\xbb\xe7\x9a\x84\xe7\x88\xac\xe8\x99\xab' <class 'str'>
>>>
>>> str2='京东放养的爬虫'
>>> uu=str2.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'

将unicode编码转换为汉字,前边不带u的

# 将unicode编码转换为汉字,前边不带u的
---------------------------------python2运行结果----------------------------------------------------
str = '\u4eac\u4e1c\u653e\u517b\u7684\u722c\u866b'

# 方法1 使用unicode_escape 解码
print (str.decode('unicode_escape'))
print (unicode(str, 'unicode_escape'))

# 方法2：若为json 格式，使用json.loads 解码
print(json.loads('"%s"' % str))

# 方法3：使用eval
print (eval('u"%s"' % str))

将u’\u810f\u4e71’转换为’\u810f\u4e71’

# 将u'\u810f\u4e71'转换为'\u810f\u4e71'  
---------------------------------python2运行结果----------------------------------------------------
>>> s_unicode = u'\u810f\u4e71'  
>>> s_str = s_unicode.encode('unicode-escape').decode('string_escape')   
>>> print(s_str)
\u810f\u4e71
>>> print(type(s_unicode),type(s_str))
(<type 'unicode'>, <type 'str'>)
>>>

中文转Unicode编码

---------------------------------python3运行结果----------------------------------------------------
>>> chinese = "你好"
>>> re = chinese.encode("unicode_escape")
>>> print(re)
b'\\u4f60\\u597d'

Unicode编码转中文

方法一

unicode = b'\\u4f60\\u597d'
re = unicode.decode("unicode_escape")
print(re)

返回：你好

方法二

unicode = '\\u4f60\\u597d'
re = unicode.encode('utf-8').decode('unicode_escape')
print(re)

返回：你好

方法三
遇到Unicode是通过requests在网上爬取的时候，你也可以这样

response = requests.post(url,headers=headers)
print(response.text.encode('utf-8').decode('unicode_escape'))

方法四
方法三可能有报错，还可以这样

response = requests.get(url,headers=headers)
re = eval("u"+"\'"+response.text+"\'")
print(re)

参考：https://blog.csdn.net/weixin_45418194/article/details/105182185