python通用编码_Chardet 是一款通用的Python 2/3字符编码检测器

Chardet: The Universal Character Encoding Detector

Detects

ASCII, UTF-8, UTF-16 (2 variants), UTF-32 (4 variants)

Big5, GB2312, EUC-TW, HZ-GB-2312, ISO-2022-CN (Traditional and Simplified Chinese)

EUC-JP, SHIFT_JIS, CP932, ISO-2022-JP (Japanese)

EUC-KR, ISO-2022-KR (Korean)

KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, windows-1251 (Cyrillic)

ISO-8859-5, windows-1251 (Bulgarian)

ISO-8859-1, windows-1252 (Western European languages)

ISO-8859-7, windows-1253 (Greek)

ISO-8859-8, windows-1255 (Visual and Logical Hebrew)

TIS-620 (Thai)

Note

Our ISO-8859-2 and windows-1250 (Hungarian) probers have been temporarily disabled until we can retrain the models.

Requires Python 2.7 or 3.5+.

Installation

Install from PyPI:

pip install chardet

Documentation

For users, docs are now available at https://chardet.readthedocs.io/.

Command-line Tool

chardet comes with a command-line script which reports on the encodings of one or more files:

% chardetect somefile someotherfile

somefile: windows-1252 with confidence 0.5

someotherfile: ascii with confidence 1.0

About

This is a continuation of Mark Pilgrim's excellent chardet. Previously, two versions needed to be maintained: one that supported python 2.x and one that supported python 3.x. We've recently merged with Ian Cordasco's charade fork, so now we have one coherent version that works for Python 2.7+ and 3.4+.

maintainer:

Dan Blanchard

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值