笔记-python-lib-chardet
1. chardet
chardet是一个非常优秀的编码识别模块, 是python的第三方库,需要下载和安装。
文档地址:https://pypi.org/project/chardet/
当然它不是所有的编码格式都能识别,具体可识别的编码格式参见文档。
1.1. installation
pip install chardet
1.2. 使用
1.2.1. 模块内调用
import chardet
rawdata = b'sdfwe'
res = chardet.detect(rawdata)
print(res)
输出:
{'encoding': 'ascii', 'confidence': 1.0, 'language': ''}
1.2.2. 命令行模式
chardet comes with a command-line script which reports on the encodings of one or more files:
% chardetect somefile someotherfile
somefile: windows-1252 with confidence 0.5
someotherfile: ascii with confidence 1.0
1.3. 关于解码原理
It means taking a sequence of bytes in an unknown character encoding, and attempting to determine the encoding so you can read the text. It’s like cracking a code when you don’t have the decryption key.
简单来说,就是从对象中选取一小部分,根据它的特征去猜编码格式。