java怎么获取行分隔符,如何在Java中使用不同的行分隔符处理文件?

I have a huge file (more than 3GB) that contains a single long line in the following format.

"1243@818@9287@543"

Then the data I want to analyze is separated with "@". My idea is to change the default end of line

character used by Java ans set "@".

I'm trying with the following code using "System.setProperty("line.separator", "@");" but is not working, since is printing the complete line and for this test I'd like as output.

1243

818

9287

543

How can I change the default line separator to "@"?

package test;

import java.io.BufferedReader;

import java.io.File;

import java.io.FileNotFoundException;

import java.io.FileReader;

import java.io.IOException;

public class Test {

public static void main(String[] args) throws FileNotFoundException, IOException {

System.setProperty("line.separator", "@");

File testFile = new File("./Mypath/myfile");

BufferedReader br = new BufferedReader(new FileReader(testFile));

for(String line; (line = br.readLine()) != null; ) {

// Process each the line.

System.out.println(line);

}

}

}

Thanks in advance for any help.

解决方案

Then the data I want to analyze is separated with "@". My idea is to

change the default end of line character used by Java ans set "@".

I wouldn't do that as it might break God knows what else that is depending on line.separator.

As for why this doesn't work, I'm sorry to say this is a case of RTFM not being done. This is what the Javadocs for BufferedReader.readLine has to say:

public String readLine()

throws IOException

Reads a line of text. A line is considered to be terminated by any one of a line feed ('\n'), a carriage return ('\r'), or a carriage return followed immediately by a linefeed.

Returns: A String containing the contents of the line, not including any line-termination characters, or null if the end of the stream has been reached

Throws: IOException - If an I/O error occurs

The API docs for the readLine() method clearly says that it looks for '\n' or '\r'. It does not say it depends on line.separator.

The line.separator property is only for developing API's that need a portable, platform-independent mechanism that identifies line separators. That is all. This system property is not for controlling the internal mechanisms of Java's IO classes.

I think you are over-complicating things. Just do it the old fashion way by reading n-number of characters (say 1024KB) on a buffer, and scan for each '@' delimiter. That introduces complications such as normal cases where data between '@' delimiters get split between buffers.

So, I would suggest just read one character off the buffered reader (this is not that bad and does not typically hit IO excessively since the buffered reader does... tada... buffering for you.)

Pump each character to a string builder, and every time you find a '@' delimiter, you flush the content of the string builder to standard output or whatever (since that would represent a datum off your '@' file.)

Get the algorithm to work correctly first. Optimize later. This is the pseudo-code below, no guarantees there are no compilation errors. You should be able to trivially flesh it out in syntactically correct Java:

File testFile = new File("./Mypath/myfile");

int buffer_size = 1024 * 1024

BufferedReader br = new BufferedReader(new FileReader(testFile), buffer_size);

StringBuilder bld = StringBuilder();

int c = br.read();

while(c != -1){

char z = (char)c;

if(z == '@'){

System.out.println(bld);

if(bld.length() > 0){

bld.delete(0, bld.length() - 1);

}

} else {

bld.append(z);

}

}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值