译自 https://www.npmjs.com/package/iconv-lite
iconv-lite
Convert character encodings in pure javascript.
使用纯 javascript 转化字符编码。
Pure JS character encoding conversion(纯 JS 字符编码转换)
- Doesn’t need native code compilation. Works on Windows and in sandboxed environments like Cloud9.
- Used in popular projects like Express.js (body_parser), Grunt, Nodemailer, Yeoman and others.
- Faster than node-iconv (see below for performance comparison).
- Intuitive encode/decode API
- Streaming support for Node v0.10+
- [Deprecated] Can extend Node.js primitives (buffers, streams) to support all iconv-lite encodings.
- In-browser usage via Browserify (~180k gzip compressed with Buffer shim included).
- Typescript type definition file included.
- License: MIT.
纯 JS 的字符编码转换
- 不需要编辑原生代码。工作在 Windows 和像 Cloud9 一样的沙盘环境。
- 使用在著名的项目中,比如Express.js (body_parser), Grunt, Nodemailer, Yeoman 等等。
- 比 node-iconv 更快的速度(查看下面的性能比较)。
- 直观的 编码 / 解码 API。
- 支持 Node v0.10 以上的流。
- [已废弃] 可以继承 Node.js 原始元素(buffers,streams)来支持所有的 iconv-lite 编码。
- 在浏览器中通过使用Browserify(webpack更好)(经过 gzip 压缩的精简字符大约 180K)
- 包括 Typescript 类型的文件。
- 许可证:MIT
安装方式 npm install iconv-lite
MIT许可证说明
MIT 与其他常见的软件授权条款(如GPL、LGPL、BSD)相比,MIT是相对宽松的软件授权条款。
- 被授权人权利
被授权人有权利使用、复制、修改、合并、出版发行、散布、再授权及贩售软件及软件的副本。被授权人可根据程序的需要修改授权条款为适当的内容。 - 被授权人义务
在软件和软件的所有副本中都必须包含版权声明和许可声明。 - 其他重要特性
此授权条款并非属copyleft的自由软件授权条款,允许在自由/开放源码软件或非自由软件(proprietary software)所使用。MIT的内容可依照程序著作权者的需求更改内容。此亦为MIT与BSD(The BSD license, 3-clause BSD license)本质上不同处。MIT条款可与其他授权条款并存。另外,MIT条款也是自由软件基金会(FSF)所认可的自由软件授权条款,与GPL兼容。
Usage(使用说明)
Base API
var iconv = require('iconv-lite');
// Convert from an encoded buffer to js string.
str = iconv.decode(new Buffer([0x68, 0x65, 0x6c, 0x6c, 0x6f]), 'win1251');
// Convert from js string to an encoded buffer.
buf = iconv.encode("Sample input string", 'win1251');
// Check if encoding is supported
iconv.encodingExists("us-ascii")
补充说明:windows-1251 是 单字节编码 的 俄语。
Streaming API(Node v0.10+)
// Decode stream (from binary stream to js strings)
http.createServer(function(req, res) {
var converterStream = iconv.decodeStream('win1251');
req.pipe(converterStream);
converterStream.on('data', function(str) {
console.log(str); // Do something with decoded strings, chunk-by-chunk.
});
});
// Convert encoding streaming example
fs.createReadStream('file-in-win1251.txt')
.pipe(iconv.decodeStream('win1251'))
.pipe(iconv.encodeStream('ucs2'))
.pipe(fs.createWriteStream('file-in-ucs2.txt'));
// Sugar: all encode/decode streams have .collect(cb) method to accumulate data.
http.createServer(function(req, res) {
req.pipe(iconv.decodeStream('win1251')).collect(function(err, body) {
assert(typeof body == 'string');
console.log(body); // full request body string
});
});
[Deprecated] Extend Node.js own encodings(继承 Node.js 自己的编码)
NOTE: This doesn’t work on latest Node versions. See details.
注意:最新的 Node 版本不支持该功能,详见 https://github.com/ashtuchkin/iconv-lite/wiki/Node-v4-compatibility
// After this call all Node basic primitives will understand iconv-lite encodings.
iconv.extendNodeEncodings();
// Examples:
buf = new Buffer(str, 'win1251');
buf.write(str, 'gbk');
str = buf.toString('latin1');
assert(Buffer.isEncoding('iso-8859-15'));
Buffer.byteLength(str, 'us-ascii');
http.createServer(function(req, res) {
req.setEncoding('big5');
req.collect(function(err, body) {
console.log(body);
});
});
fs.createReadStream("file.txt", "shift_jis");
// External modules are also supported (if they use Node primitives, which they probably do).
request = require('request');
request({
url: "http://github.com/",
encoding: "cp932"
});
// To remove extensions
iconv.undoExtendNodeEncodings();
Supported encodings(支持的编码)
- All node.js native encodings: utf8, ucs2 / utf16-le, ascii, binary, base64, hex.
- Additional unicode encodings: utf16, utf16-be, utf-7, utf-7-imap.
- All widespread singlebyte encodings: Windows 125x family, ISO-8859 family, IBM/DOS codepages, Macintosh family, KOI8 family, all others supported by iconv library. Aliases like ‘latin1’, ‘us-ascii’ also supported.
All widespread multibyte encodings: CP932, CP936, CP949, CP950, GB2313, GBK, GB18030, Big5, Shift_JIS, EUC-JP.
所有 node.js 原生的编码有:utf8, ucs2 / utf16-le, ascii, binary, base64, hex.
- 额外的 unicode 编码有:utf16, utf16-be, utf-7, utf-7-imap.
- 所有普及的单字节编码有:Windows 125x family, ISO-8859 family, IBM/DOS codepages, Macintosh family, KOI8 family, 所有其它的 iconv 库支持的编码有. 也支持像 ‘latin1’, ‘us-ascii’ 这样的别名。
- 所有普及的多字节编码有:CP932, CP936, CP949, CP950, GB2313, GBK, GB18030, Big5, Shift_JIS, EUC-JP.
在 wiki 上查看所有支持的编码。
Most singlebyte encodings are generated automatically from node-iconv. Thank you Ben Noordhuis and libiconv authors!
大部分的单字节编码都可以由 node-iconv 自动生成。感谢 Ben Noordhuis 和 libiconv 的作者。
Multibyte encodings are generated from Unicode.org mappings and WHATWG Encoding Standard mappings. Thank you, respective authors!
多字节编码均是由 Unicode 映射 和 WHATWG 标准编码映射生成的,感谢各自得作者。
Encoding/decoding speed(编码/解码速度)
Comparison with node-iconv module (1000x256kb, on MacBook Pro, Core i5/2.6 GHz, Node v0.12.0). Note: your results may vary, so please always check on your hardware.
与 node-iconv 模块进行比较(1000x256kb, on MacBook Pro, Core i5/2.6 GHz, Node v0.12.0)。注意:你的结果可能会不同,因此请总是在你的硬件上进行校验。
operation | iconv@2.1.4 | iconv-lite@0.4.7 |
---|---|---|
encode(‘win1251’) | ~96 Mb/s | ~320 Mb/s |
decode(‘win1251’) | ~95 Mb/s | ~246 Mb/s |
BOM handling(处理浏览器对象模型)
- Decoding: BOM is stripped by default, unless overridden by passing
stripBOM: false
in options (f.ex.iconv.decode(buf, enc, {stripBOM: false}))
. A callback might also be given as astripBOM
parameter - it’ll be called if BOM character was actually found.- If you want to detect UTF-8 BOM when decoding other encodings, use node-autodetect-decoder-stream module.
- Encoding: No BOM added, unless overridden by
addBOM: true
option.
- 解码:BOM 按默认的方式被剥开,除非通过传递
stripBOM: false
覆盖选项。(f.ex.iconv.decode(buf, enc, {stripBOM: false}))
)。可以给一个回调函数作为stripBOM
的参数 - 如果 BOM 字符真的被找到,就会调用该函数。
当解码其它编码时,如果你想要检测是否是 UTF-8 的 BOM,需使用 node-autodetect-decoder-stream 模块。 - 解码:不会添加 BOM 元素,除非
addBOM: true
选项被覆盖。
UTF-16 Encodings(UTF-16 编码)
This library supports UTF-16LE, UTF-16BE and UTF-16 encodings. First two are straightforward, but UTF-16 is trying to be smart about endianness in the following ways:
这个库支持 UTF-16LE, UTF-16BE 和 UTF-16 编码。前两个是非常明确的,但是 UTF-16 会通过下面的方式尝试使用小尾编码:
- Decoding: uses BOM and ‘spaces heuristic’ to determine input endianness. Default is UTF-16LE, but can be overridden with
defaultEncoding: 'utf-16be'
option. Strips BOM unlessstripBOM: false
.- Encoding: uses UTF-16LE and writes BOM by default. Use
addBOM: false
to override.
- 解码:使用 BOM 和 ‘空间启发式算法’ 来决定输入的字节顺序。默认的情况是 UTF-16LE,但是可以被
defaultEncoding: 'utf-16be'
选项覆盖。一层层地剥开 BOM 直到stripBOM: false
。 - 编码:使用 UTF-16LE 和 默认的方式写入 BOM 。通过
addBOM: false
来覆盖。
Other notes(其它注意事项)
When decoding, be sure to supply a Buffer to decode() method, otherwise bad things usually happen.
解码的时候,请确保提供了一个 Buffer 给 decode() 函数,否则通常就会发生一些不好的事情。
Untranslatable characters are set to � or ?. No transliteration is currently supported.
不可翻译的字符会被设置为 � 或 ?。目前不支持音译。
Node versions 0.10.31 and 0.11.13 are buggy, don’t use them (see #65, #77).
0.10.31 和 0.11.13 版本的 Node 有点古怪,请不要使用他们。
Testing(测试)
$ git clone git@github.com:ashtuchkin/iconv-lite.git
$ cd iconv-lite
$ npm install
$ npm test
$ # To view performance:
$ node test/performance.js
$ # To view test coverage:
$ npm run coverage
$ open coverage/lcov-report/index.html