在node.js中一次读取一行文件?

本文翻译自:Read a file one line at a time in node.js?

I am trying to read a large file one line at a time. 我试图一次读一行大文件。 I found a question on Quora that dealt with the subject but I'm missing some connections to make the whole thing fit together. 在Quora上发现了一个关于这个问题的问题,但是我错过了一些联系,以使整个事情融合在一起。

 var Lazy=require("lazy");
 new Lazy(process.stdin)
     .lines
     .forEach(
          function(line) { 
              console.log(line.toString()); 
          }
 );
 process.stdin.resume();

The bit that I'd like to figure out is how I might read one line at a time from a file instead of STDIN as in this sample. 我想弄清楚的是我如何从文件而不是STDIN一次读取一行,如本示例所示。

I tried: 我试过了:

 fs.open('./VeryBigFile.csv', 'r', '0666', Process);

 function Process(err, fd) {
    if (err) throw err;
    // DO lazy read 
 }

but it's not working. 但它不起作用。 I know that in a pinch I could fall back to using something like PHP, but I would like to figure this out. 我知道,在紧要关头我可以回归使用像PHP这样的东西,但我想弄明白这一点。

I don't think the other answer would work as the file is much larger than the server I'm running it on has memory for. 我认为其他答案不会起作用,因为文件比我正在运行它的服务器大得多。


#1楼

参考:https://stackoom.com/question/PpaP/在node-js中一次读取一行文件


#2楼

Edit: 编辑:

Use a transform stream . 使用转换流


With a BufferedReader you can read lines. 使用BufferedReader,您可以读取行。

new BufferedReader ("lorem ipsum", { encoding: "utf8" })
    .on ("error", function (error){
        console.log ("error: " + error);
    })
    .on ("line", function (line){
        console.log ("line: " + line);
    })
    .on ("end", function (){
        console.log ("EOF");
    })
    .read ();

#3楼

there is a very nice module for reading a file line by line, it's called line-reader 有一个非常好的模块用于逐行读取文件,它被称为行读取器

with it you simply just write: 有了它,你只需写:

var lineReader = require('line-reader');

lineReader.eachLine('file.txt', function(line, last) {
  console.log(line);
  // do whatever you want with line...
  if(last){
    // or check if it's the last one
  }
});

you can even iterate the file with a "java-style" interface, if you need more control: 如果需要更多控制,你甚至可以使用“java风格”界面迭代文件:

lineReader.open('file.txt', function(reader) {
  if (reader.hasNextLine()) {
    reader.nextLine(function(line) {
      console.log(line);
    });
  }
});

#4楼

You can always roll your own line reader. 您可以随时滚动自己的线路阅读器。 I have'nt benchmarked this snippet yet, but it correctly splits the incoming stream of chunks into lines without the trailing '\\n' 我还没有对此片段进行基准测试,但它正确地将传入的块流分成不带尾随'\\ n'的行

var last = "";

process.stdin.on('data', function(chunk) {
    var lines, i;

    lines = (last+chunk).split("\n");
    for(i = 0; i < lines.length - 1; i++) {
        console.log("line: " + lines[i]);
    }
    last = lines[i];
});

process.stdin.on('end', function() {
    console.log("line: " + last);
});

process.stdin.resume();

I did come up with this when working on a quick log parsing script that needed to accumulate data during the log parsing and I felt that it would nice to try doing this using js and node instead of using perl or bash. 我在处理日志解析期间需要累积数据的快速日志解析脚本时确实想到了这一点,我觉得尝试使用js和节点而不是使用perl或bash这样做会很好。

Anyway, I do feel that small nodejs scripts should be self contained and not rely on third party modules so after reading all the answers to this question, each using various modules to handle line parsing, a 13 SLOC native nodejs solution might be of interest . 无论如何,我确实认为小nodejs脚本应该是自包含的而不是依赖于第三方模块,所以在阅读了这个问题的所有答案后,每个人都使用各种模块来处理行解析,13 SLOC本机nodejs解决方案可能会引起关注。


#5楼

I wanted to tackle this same problem, basically what in Perl would be: 我想解决同样的问题,基本上是Perl中的问题:

while (<>) {
    process_line($_);
}

My use case was just a standalone script, not a server, so synchronous was fine. 我的用例只是一个独立的脚本,而不是服务器,所以同步很好。 These were my criteria: 这些是我的标准:

  • The minimal synchronous code that could reuse in many projects. 可在许多项目中重用的最小同步代码。
  • No limits on file size or number of lines. 文件大小或行数没有限制。
  • No limits on length of lines. 线条长度没有限制。
  • Able to handle full Unicode in UTF-8, including characters beyond the BMP. 能够处理UTF-8中的完整Unicode,包括BMP之外的字符。
  • Able to handle *nix and Windows line endings (old-style Mac not needed for me). 能够处理* nix和Windows系列结尾(我不需要旧式Mac)。
  • Line endings character(s) to be included in lines. 行结尾要包含在行中的字符。
  • Able to handle last line with or without end-of-line characters. 能够处理包含或不包含行尾字符的最后一行。
  • Not use any external libraries not included in the node.js distribution. 不使用node.js发行版中未包含的任何外部库。

This is a project for me to get a feel for low-level scripting type code in node.js and decide how viable it is as a replacement for other scripting languages like Perl. 这是一个让我了解node.js中的低级脚本类型代码的项目,并决定它作为Perl等其他脚本语言的替代品的可行性。

After a surprising amount of effort and a couple of false starts this is the code I came up with. 经过一番惊人的努力和一些错误的开始,这就是我提出的代码。 It's pretty fast but less trivial than I would've expected: (fork it on GitHub) 它比我预期的要快但不那么简单:( 在GitHub上分叉)

var fs            = require('fs'),
    StringDecoder = require('string_decoder').StringDecoder,
    util          = require('util');

function lineByLine(fd) {
  var blob = '';
  var blobStart = 0;
  var blobEnd = 0;

  var decoder = new StringDecoder('utf8');

  var CHUNK_SIZE = 16384;
  var chunk = new Buffer(CHUNK_SIZE);

  var eolPos = -1;
  var lastChunk = false;

  var moreLines = true;
  var readMore = true;

  // each line
  while (moreLines) {

    readMore = true;
    // append more chunks from the file onto the end of our blob of text until we have an EOL or EOF
    while (readMore) {

      // do we have a whole line? (with LF)
      eolPos = blob.indexOf('\n', blobStart);

      if (eolPos !== -1) {
        blobEnd = eolPos;
        readMore = false;

      // do we have the last line? (no LF)
      } else if (lastChunk) {
        blobEnd = blob.length;
        readMore = false;

      // otherwise read more
      } else {
        var bytesRead = fs.readSync(fd, chunk, 0, CHUNK_SIZE, null);

        lastChunk = bytesRead !== CHUNK_SIZE;

        blob += decoder.write(chunk.slice(0, bytesRead));
      }
    }

    if (blobStart < blob.length) {
      processLine(blob.substring(blobStart, blobEnd + 1));

      blobStart = blobEnd + 1;

      if (blobStart >= CHUNK_SIZE) {
        // blobStart is in characters, CHUNK_SIZE is in octets
        var freeable = blobStart / CHUNK_SIZE;

        // keep blob from growing indefinitely, not as deterministic as I'd like
        blob = blob.substring(CHUNK_SIZE);
        blobStart -= CHUNK_SIZE;
        blobEnd -= CHUNK_SIZE;
      }
    } else {
      moreLines = false;
    }
  }
}

It could probably be cleaned up further, it was the result of trial and error. 它可能会被进一步清理,这是反复试验的结果。


#6楼

i use this: 我用这个:

function emitLines(stream, re){
    re = re && /\n/;
    var buffer = '';

    stream.on('data', stream_data);
    stream.on('end', stream_end);

    function stream_data(data){
        buffer += data;
        flush();
    }//stream_data

    function stream_end(){
        if(buffer) stream.emmit('line', buffer);
    }//stream_end


    function flush(){
        var re = /\n/;
        var match;
        while(match = re.exec(buffer)){
            var index = match.index + match[0].length;
            stream.emit('line', buffer.substring(0, index));
            buffer = buffer.substring(index);
            re.lastIndex = 0;
        }
    }//flush

}//emitLines

use this function on a stream and listen to the line events that is will emit. 在流上使用此函数并侦听将要发出的行事件。

gr- GR-

  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值