AttributeError: ‘str‘ object has no attribute ‘decode‘

最新推荐文章于 2023-08-07 14:47:33 发布

小龙狗

最新推荐文章于 2023-08-07 14:47:33 发布

阅读量9.5k

点赞数 1

分类专栏： Python编程文章标签： python encode decode

本文链接：https://blog.csdn.net/ShyLoneGirl/article/details/116234533

版权

Python编程专栏收录该内容

42 篇文章 0 订阅

订阅专栏

问题描述

用 Python3 写处理字符编码解码相关程序时，走到如下代码后报错

str = '你好啊树哥'
str.decode()

报错如下

AttributeError: 'str' object has no attribute 'decode'

解决方法

先说解决方法吧，省的分析原因一堆屁话大家不爱看。实际上看了也没啥个用，就是版本更新了，变得更好了，以前不好的摒弃了。用 Python3 时想得到 str 的那串 Unicode 码，这样用就行了

str = '你好啊树哥'
str_encode_unicode = str.encode('unicode_escape')
print('str_encode_unicode:', str_encode_unicode)
str_encode_unicode_decode_utf_8 = str_encode_unicode.decode('utf-8')
print('str_encode_unicode_decode_utf_8:', str_encode_unicode_decode_utf_8)

看一下结果

str_encode_unicode: b'\\u4f60\\u597d\\u554a\\u6811\\u54e5'
str_encode_unicode_decode_utf_8: \u4f60\u597d\u554a\u6811\u54e5

写爬虫程序时，下来的数据经常容易出编码错误，要不就是 b'' ，要不就是 u'' 啥的，通过 encode() 和 decode() 的不同配比应该能解决大部分问题。尽量还是用 Python3 吧。

原因分析

Python2与Python3有区别

上面报告的错误就源于 Python2 和 Python3 对字符序列编码解码的处理上有极大区别。

在 Python3 用 bytes 和 str 表示字符序列。bytes 包含原始的8位值，str 包含 Unicode 字符。
而 Python2 用的是 str 和 unicode ，str 包含原始的8位值，unicode 包含Unicode字符。
Python2 中的 str 实际是一个 byte 数组而非我们理解的字符串，像这样 '\xc4\xe3\xba\xc3\xb0\xa1\xca\xf7\xb8\xe7' ，它对应的是 Python3 中的 bytes 。
Python2 中的 unicode 才是我们理解的那种字符串，像 '\u4f60\u597d\u554a\u6811\u54e5' ，它对应的是 Python3 中的 str 。

编码与解码

说一下 Encode 和 Decode 。

编码，Encode。以指定的编码格式将字符序列编码成字符串。
解码，Decode。以指定的编码格式将字符串转为字符序列。

Python2 的转换关系

str —> decode() —> unicode
unicode —> encode() —> str

Python3 的转换关系

bytes —> decode() —> str
str —> encode() —> bytes

实例分析

当遇到字符串编码不符合预期时，可参考如下代码解决 (Python3)。

if __name__ == '__main__':
    str = '你好啊树哥'
    print('str:', str)
    print('str-Type:',type(str))
    print('---')
    str_encode_utf8 = str.encode('utf-8')
    print('str_encode_utf8:', str_encode_utf8)
    print('str_encode_utf8_Type:', type(str_encode_utf8))
    str_encode_utf8_decode_utf8 = str_encode_utf8.decode('utf-8')
    print('str_encode_utf8_decode_utf8:', str_encode_utf8_decode_utf8)
    str_encode_utf8_decode_unicode = str_encode_utf8.decode('unicode_escape')
    print('str_encode_utf8_decode_unicode:', str_encode_utf8_decode_unicode)
    print('---')
    str_encode_gbk = str.encode('gbk')
    print('str_encode_gbk:', str_encode_gbk)
    print('str_encode_gbk_Type:', type(str_encode_gbk))
    str_encode_gbk_decode_gbk = str_encode_gbk.decode('gbk')
    print('str_encode_gbk_decode_gbk:', str_encode_gbk_decode_gbk)
    str_encode_gbk_decode_unicode = str_encode_gbk.decode('unicode_escape')
    print('str_encode_gbk_decode_unicode:', str_encode_gbk_decode_unicode)
    print('---')
    str_encode_unicode = str.encode('unicode_escape')
    print('str_encode_unicode:', str_encode_unicode)
    print('str_encode_unicode_Type:', type(str_encode_unicode))
    str_encode_unicode_decode_utf_8 = str_encode_unicode.decode('utf-8')
    print('str_encode_unicode_decode_utf_8:', str_encode_unicode_decode_utf_8)
    str_encode_unicode_decode_gbk = str_encode_unicode.decode('gbk')
    print('str_encode_unicode_decode_gbk:', str_encode_unicode_decode_gbk)
    str_encode_unicode_decode_unicode = str_encode_unicode.decode('unicode_escape')
    print('str_encode_unicode_decode_unicode:', str_encode_unicode_decode_unicode)