爬虫字体加密——乱码
自定义的字体库加密信息都是有限制的,全文乱码多数都是改变的的编码格式。
认识几个王者荣耀
gbk编码 unicode 解码
a = '王者荣耀'
print(a.encode('gbk').decode('unicode_escape'))
>>>ÍõÕßÈÙÒ«
unicode编码 utf-16 解码
print(a.encode('raw_unicode_escape').decode('utf16'))
>>>畜㌷戸畜〸㔰畜㌸㌶畜〸〰
unicode 编码不同结果
# 王者荣耀
a = "ÍõÕßÈÙÒ«"
print(a.encode('unicode_escape'))
print(a.encode('raw_unicode_escape'))
>>>b'\\xcd\\xf5\\xd5\\xdf\\xc8\\xd9\\xd2\\xab'
>>>b'\xcd\xf5\xd5\xdf\xc8\xd9\xd2\xab'
gb
a = "王者荣耀"
print(a.encode('gb18030'))
print(a.encode('gb2312'))
print(a.encode('gbk'))
>>>b'\xcd\xf5\xd5\xdf\xc8\xd9\xd2\xab'
>>>b'\xcd\xf5\xd5\xdf\xc8\xd9\xd2\xab'
>>>b'\xcd\xf5\xd5\xdf\xc8\xd9\xd2\xab'
print(a.encode('gb18030'))
print(a.encode('gb2312'))
>>> b'\x810\x881\x810\x8b1\x810\x889\x810\x898\x810\x876\x810\x892\x810\x886\x810\x850'
>>>UnicodeEncodeError: 'gb2312' codec can't encode character '\xcd' in position 0: illegal multibyte sequence
unicode 转中文
a = u'\u516d\u795e\u6e05\u51c9\u723d\u80a4\u6c90\u6d74\u9732'
b = a.encode("GBK").decode("GBK")]
#b = a.encode("utf8").decode("utf8") 相同
>>>六神清凉爽肤沐浴露
未完