java如何大文件中查找字符创,在java中查找非常大的格式化文本文件中的字符串...

Here is the thing:

I have a really big text file and it has a format like this:

0007476|000011434982|00249626000|R|2008-01-11 00:00:00|9999-12-31 23:59:59|000019.99

0007476|000014017887|00313865000|R|2011-04-19 00:00:00|9999-12-31 23:59:59|000599.99

...

...

And I need to find if a particular pattern exists in the file, say

0007476|whatever|00313865000|whatever

All I need is a boolean saying yes or no.

Now what I have done is to read the file line by line and do a regular expression matching:

Pattern pattern = Pattern.compile(regex);

Scanner scanner = new Scanner(new File(fileName));

String line;

while (scanner.hasNextLine()) {

line = scanner.nextLine();

if (pattern.matcher(line).matches()) {

scanner.close();

return true;

}

}

and the regex has a form of

"0007476\|\d{12}\|0031386500.*

This method works, but it takes usually 15 seconds to search for a string that is far from the start line. Is there a faster way to achieve that? Thanks

解决方案

I assume that you need the Scanner because the file is too big to read into a single String instead?

If that is not the case, you can probably use a regular expression that finds the match directly. Depending on whether or not you care about the specific text at the start of the line you can you something along the lines of:

"(?m)^0007476\|\d{12}\|0031386500.*$

If you do need to break it up into smaller chunks because of memory usage I would suggest not reading on a per line basis, (since the lines are rather short), but process bigger chunks using something like a BufferedReader instead?

I fiddled around a bit with a 1.25GB file and the following is about 2.5 times faster than your implementation:

private static boolean matches() throws IOException {

String regex = "(?m)^0007476\|\d{12}\|0031386500.*$";

Pattern pattern = Pattern.compile(regex);

try(BufferedReader br = new BufferedReader(new FileReader(FILENAME))) {

for(String lines; (lines = readLines(br, 10000)) != null; ) {

if (pattern.matcher(lines).find()) {

return true;

}

}

}

return false;

}

private static String readLines(BufferedReader br, int amount) throws IOException {

StringBuilder builder = new StringBuilder();

int lineCounter = 0;

for(String line; (line = br.readLine()) != null && lineCounter < amount; lineCounter++ ) {

builder.append(line).append(System.lineSeparator());

}

return lineCounter > 0 ? builder.toString() : null;

}

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值