Node.js npm iconv-lite

最新推荐文章于 2024-07-25 20:29:35 发布

福州司马懿

最新推荐文章于 2024-07-25 20:29:35 发布

阅读量1w

点赞数

分类专栏： # npm 文章标签： node.js npm iconv-lite

npm 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

译自 https://www.npmjs.com/package/iconv-lite

iconv-lite

Convert character encodings in pure javascript.

使用纯 javascript 转化字符编码。

Pure JS character encoding conversion（纯 JS 字符编码转换）

Doesn’t need native code compilation. Works on Windows and in sandboxed environments like Cloud9.
Used in popular projects like Express.js (body_parser), Grunt, Nodemailer, Yeoman and others.
Faster than node-iconv (see below for performance comparison).
Intuitive encode/decode API
Streaming support for Node v0.10+
[Deprecated] Can extend Node.js primitives (buffers, streams) to support all iconv-lite encodings.
In-browser usage via Browserify (~180k gzip compressed with Buffer shim included).
Typescript type definition file included.
License: MIT.

纯 JS 的字符编码转换

不需要编辑原生代码。工作在 Windows 和像 Cloud9 一样的沙盘环境。
使用在著名的项目中，比如Express.js (body_parser), Grunt, Nodemailer, Yeoman 等等。
比 node-iconv 更快的速度（查看下面的性能比较）。
直观的编码 / 解码 API。
支持 Node v0.10 以上的流。
[已废弃] 可以继承 Node.js 原始元素（buffers，streams）来支持所有的 iconv-lite 编码。
在浏览器中通过使用Browserify（webpack更好）（经过 gzip 压缩的精简字符大约 180K）
包括 Typescript 类型的文件。
许可证：MIT

安装方式 npm install iconv-lite

这里写图片描述

MIT许可证说明

MIT 与其他常见的软件授权条款（如GPL、LGPL、BSD）相比，MIT是相对宽松的软件授权条款。

被授权人权利
被授权人有权利使用、复制、修改、合并、出版发行、散布、再授权及贩售软件及软件的副本。被授权人可根据程序的需要修改授权条款为适当的内容。
被授权人义务
在软件和软件的所有副本中都必须包含版权声明和许可声明。
其他重要特性
此授权条款并非属copyleft的自由软件授权条款，允许在自由/开放源码软件或非自由软件（proprietary software）所使用。MIT的内容可依照程序著作权者的需求更改内容。此亦为MIT与BSD（The BSD license, 3-clause BSD license）本质上不同处。MIT条款可与其他授权条款并存。另外，MIT条款也是自由软件基金会（FSF）所认可的自由软件授权条款，与GPL兼容。

Usage（使用说明）

Base API

var iconv = require('iconv-lite');

// Convert from an encoded buffer to js string. 
str = iconv.decode(new Buffer([0x68, 0x65, 0x6c, 0x6c, 0x6f]), 'win1251');

// Convert from js string to an encoded buffer. 
buf = iconv.encode("Sample input string", 'win1251');

// Check if encoding is supported 
iconv.encodingExists("us-ascii")

补充说明：windows-1251 是单字节编码的俄语。

Streaming API（Node v0.10+）

// Decode stream (from binary stream to js strings) 
http.createServer(function(req, res) {
    var converterStream = iconv.decodeStream('win1251');
    req.pipe(converterStream);

    converterStream.on('data', function(str) {
        console.log(str); // Do something with decoded strings, chunk-by-chunk. 
    });
});

// Convert encoding streaming example 
fs.createReadStream('file-in-win1251.txt')
    .pipe(iconv.decodeStream('win1251'))
    .pipe(iconv.encodeStream('ucs2'))
    .pipe(fs.createWriteStream('file-in-ucs2.txt'));

// Sugar: all encode/decode streams have .collect(cb) method to accumulate data. 
http.createServer(function(req, res) {
    req.pipe(iconv.decodeStream('win1251')).collect(function(err, body) {
        assert(typeof body == 'string');
        console.log(body); // full request body string 
    });
});

[Deprecated] Extend Node.js own encodings（继承 Node.js 自己的编码）

NOTE: This doesn’t work on latest Node versions. See details.

注意：最新的 Node 版本不支持该功能，详见 https://github.com/ashtuchkin/iconv-lite/wiki/Node-v4-compatibility

// After this call all Node basic primitives will understand iconv-lite encodings. 
iconv.extendNodeEncodings();

// Examples: 
buf = new Buffer(str, 'win1251');
buf.write(str, 'gbk');
str = buf.toString('latin1');
assert(Buffer.isEncoding('iso-8859-15'));
Buffer.byteLength(str, 'us-ascii');

http.createServer(function(req, res) {
    req.setEncoding('big5');
    req.collect(function(err, body) {
        console.log(body);
    });
});

fs.createReadStream("file.txt", "shift_jis");

// External modules are also supported (if they use Node primitives, which they probably do). 
request = require('request');
request({
    url: "http://github.com/", 
    encoding: "cp932"
});

// To remove extensions 
iconv.undoExtendNodeEncodings();

Supported encodings（支持的编码）

All node.js native encodings: utf8, ucs2 / utf16-le, ascii, binary, base64, hex.
Additional unicode encodings: utf16, utf16-be, utf-7, utf-7-imap.
All widespread singlebyte encodings: Windows 125x family, ISO-8859 family, IBM/DOS codepages, Macintosh family, KOI8 family, all others supported by iconv library. Aliases like ‘latin1’, ‘us-ascii’ also supported.
All widespread multibyte encodings: CP932, CP936, CP949, CP950, GB2313, GBK, GB18030, Big5, Shift_JIS, EUC-JP.
所有 node.js 原生的编码有：utf8, ucs2 / utf16-le, ascii, binary, base64, hex.
额外的 unicode 编码有：utf16, utf16-be, utf-7, utf-7-imap.
所有普及的单字节编码有：Windows 125x family, ISO-8859 family, IBM/DOS codepages, Macintosh family, KOI8 family, 所有其它的 iconv 库支持的编码有. 也支持像 ‘latin1’, ‘us-ascii’ 这样的别名。
所有普及的多字节编码有：CP932, CP936, CP949, CP950, GB2313, GBK, GB18030, Big5, Shift_JIS, EUC-JP.

See all supported encodings on wiki.

在 wiki 上查看所有支持的编码。

Most singlebyte encodings are generated automatically from node-iconv. Thank you Ben Noordhuis and libiconv authors!

大部分的单字节编码都可以由 node-iconv 自动生成。感谢 Ben Noordhuis 和 libiconv 的作者。

Multibyte encodings are generated from Unicode.org mappings and WHATWG Encoding Standard mappings. Thank you, respective authors!

多字节编码均是由 Unicode 映射和 WHATWG 标准编码映射生成的，感谢各自得作者。

Encoding/decoding speed（编码/解码速度）

Comparison with node-iconv module (1000x256kb, on MacBook Pro, Core i5/2.6 GHz, Node v0.12.0). Note: your results may vary, so please always check on your hardware.

与 node-iconv 模块进行比较（1000x256kb, on MacBook Pro, Core i5/2.6 GHz, Node v0.12.0）。注意：你的结果可能会不同，因此请总是在你的硬件上进行校验。

operation	iconv@2.1.4	iconv-lite@0.4.7
encode(‘win1251’)	~96 Mb/s	~320 Mb/s
decode(‘win1251’)	~95 Mb/s	~246 Mb/s

BOM handling（处理浏览器对象模型）

Decoding: BOM is stripped by default, unless overridden by passing stripBOM: false in options (f.ex. iconv.decode(buf, enc, {stripBOM: false})). A callback might also be given as a stripBOM parameter - it’ll be called if BOM character was actually found.
If you want to detect UTF-8 BOM when decoding other encodings, use node-autodetect-decoder-stream module.
Encoding: No BOM added, unless overridden by addBOM: true option.

解码：BOM 按默认的方式被剥开，除非通过传递stripBOM: false覆盖选项。（f.ex. iconv.decode(buf, enc, {stripBOM: false}))）。可以给一个回调函数作为stripBOM的参数 - 如果 BOM 字符真的被找到，就会调用该函数。
当解码其它编码时，如果你想要检测是否是 UTF-8 的 BOM，需使用 node-autodetect-decoder-stream 模块。
解码：不会添加 BOM 元素，除非 addBOM: true 选项被覆盖。

UTF-16 Encodings（UTF-16 编码）

This library supports UTF-16LE, UTF-16BE and UTF-16 encodings. First two are straightforward, but UTF-16 is trying to be smart about endianness in the following ways:

这个库支持 UTF-16LE, UTF-16BE 和 UTF-16 编码。前两个是非常明确的，但是 UTF-16 会通过下面的方式尝试使用小尾编码：

Decoding: uses BOM and ‘spaces heuristic’ to determine input endianness. Default is UTF-16LE, but can be overridden with defaultEncoding: 'utf-16be' option. Strips BOM unless stripBOM: false.
Encoding: uses UTF-16LE and writes BOM by default. Use addBOM: false to override.

解码：使用 BOM 和 ‘空间启发式算法’ 来决定输入的字节顺序。默认的情况是 UTF-16LE，但是可以被 defaultEncoding: 'utf-16be' 选项覆盖。一层层地剥开 BOM 直到 stripBOM: false。
编码：使用 UTF-16LE 和默认的方式写入 BOM 。通过 addBOM: false 来覆盖。

Other notes（其它注意事项）

When decoding, be sure to supply a Buffer to decode() method, otherwise bad things usually happen.

解码的时候，请确保提供了一个 Buffer 给 decode() 函数，否则通常就会发生一些不好的事情。

Untranslatable characters are set to � or ?. No transliteration is currently supported.

不可翻译的字符会被设置为 � 或 ?。目前不支持音译。

Node versions 0.10.31 and 0.11.13 are buggy, don’t use them (see #65, #77).

0.10.31 和 0.11.13 版本的 Node 有点古怪，请不要使用他们。

Testing（测试）

$ git clone git@github.com:ashtuchkin/iconv-lite.git
$ cd iconv-lite
$ npm install
$ npm test

$ # To view performance:
$ node test/performance.js

$ # To view test coverage:
$ npm run coverage
$ open coverage/lcov-report/index.html

福州司马懿

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
Node.js npm iconv-lite

译自 https://www.npmjs.com/package/iconv-lite Convert character encodings in pure javascript.使用纯 javascript 转化字符编码。Pure JS character encoding conversion（纯 JS 字符编码转换） Doesn’t need native code comp
复制链接

扫一扫

专栏目录