- 博客(0)
- 资源 (3)
- 收藏
- 关注
Universal Character Set Detector
# Universal Character Set Detector (UCSD)
从Mozilla和siuying的代码中扒拉出来的文本编码自动检测模块,主要是基于字频判断,检测ANSI编码的CJK系还算比较准确,但是对于没有BOM的UTF16数据流效果很差
Code is from [siuying/UniversalDetector][1] and [Mozilla][2]
Thanks
## Known character sets
The list of possible character sets that can be returned from the library as of the most recent update are:
Big5
EUC-JP
EUC-KR
GB18030
gb18030
HZ-GB-2312
IBM855
IBM866
ISO-2022-CN
ISO-2022-JP
ISO-2022-KR
ISO-8859-2
ISO-8859-5
ISO-8859-7
ISO-8859-8
KOI8-R
Shift_JIS
TIS-620
UTF-8
UTF-16BE
UTF-16LE
UTF-32BE
UTF-32LE
windows-1250
windows-1251
windows-1252
windows-1253
windows-1255
x-euc-tw
X-ISO-10646-UCS-4-2143
X-ISO-10646-UCS-4-3412
x-mac-cyrillic
## Licensing
Depend on Mozilla UCSD, Maybe [MPL2.0][3]
[1]: https://github.com/siuying/UniversalDetector
[2]: http://www-archive.mozilla.org/projects/intl/detectorsrc.html
[3]: http://mozilla.org/MPL/2.0/
2014-08-22
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人