perl java_与Perl相比Java性能问题

我已经编写了一个Perl代码来处理大量CSV文件并获取输出,这需要0.8326秒才能完成.

my $opname = $ARGV[0];

my @files = `find . -name "*${opname}*.csv";mtime -10 -type f`;

my %hash;

foreach my $file (@files) {

chomp $file;

my $time = $file;

$time =~ s/.*\~(.*?)\..*/$1/;

open(IN, $file) or print "Can't open $file\n";

while () {

my $line = $_;

chomp $line;

my $severity = (split(",", $line))[6];

next if $severity =~ m/NORMAL/i;

$hash{$time}{$severity}++;

}

close(IN);

}

foreach my $time (sort {$b <=> $a} keys %hash) {

foreach my $severity ( keys %{$hash{$time}} ) {

print $time . ',' . $severity . ',' . $hash{$time}{$severity} . "\n";

}

}

现在,我用Java编写了相同的逻辑,但我花了2600毫秒(即2.6秒)来完成.我的问题是Java为什么要花这么长时间?如何达到与Perl相同的速度?

注意:我忽略了VM初始化和类加载时间.

import java.io.BufferedReader;

import java.io.File;

import java.io.FileFilter;

import java.io.FileReader;

import java.io.IOException;

import java.util.HashMap;

import java.util.Map;

import java.util.TreeMap;

public class MonitoringFileReader {

static Map> store= new TreeMap>();

static String opname;

public static void testRead(String filepath) throws IOException

{

File file = new File(filepath);

FileFilter fileFilter= new FileFilter() {

@Override

public boolean accept(File pathname) {

// TODO Auto-generated method stub

int timediffinhr=(int) ((System.currentTimeMillis()-pathname.lastModified())/86400000);

if(timediffinhr<10 && pathname.getName().endsWith(".csv")&& pathname.getName().contains(opname)){

return true;

}

else

return false;

}

};

File[] listoffiles= file.listFiles(fileFilter);

long time= System.currentTimeMillis();

for(File mf:listoffiles){

String timestamp=mf.getName().split("~")[5].replace(".csv", "");

BufferedReader br= new BufferedReader(new FileReader(mf),1024*500);

String line;

Map tmp=store.containsKey(timestamp)?store.get(timestamp):new HashMap();

while((line=br.readLine())!=null)

{

String severity=line.split(",")[6];

if(!severity.equals("NORMAL"))

{

tmp.put(severity, tmp.containsKey(severity)?tmp.get(severity)+1:1);

}

}

store.put(timestamp, tmp);

}

time=System.currentTimeMillis()-time;

System.out.println(time+"ms");

System.out.println(store);

}

public static void main(String[] args) throws IOException

{

opname = args[0];

long time= System.currentTimeMillis();

testRead("./SMF/data/analyser/archive");

time=System.currentTimeMillis()-time;

System.out.println(time+"ms");

}

}

文件输入格式(A?B?C?D?E?20150715080000.csv),每个文件约500个,每个文件约1MB,

A,B,C,D,E,F,CRITICAL,G

A,B,C,D,E,F,NORMAL,G

A,B,C,D,E,F,INFO,G

A,B,C,D,E,F,MEDIUM,G

A,B,C,D,E,F,CRITICAL,G

Java版本:1.7

Update //

根据以下评论,

我用regex替换了split,性能得到了很大的改善.

现在,我正在循环执行此操作,经过3-10次迭代后,性能还是可以接受的.

import java.io.BufferedReader;

import java.io.File;

import java.io.FileFilter;

import java.io.FileReader;

import java.io.IOException;

import java.util.HashMap;

import java.util.Map;

import java.util.regex.Matcher;

import java.util.regex.Pattern;

public class MonitoringFileReader {

static Map> store= new HashMap>();

static String opname="Etis_Egypt";

static Pattern pattern1=Pattern.compile("(\\d+\\.)");

static Pattern pattern2=Pattern.compile("(?:\"([^\"]*)\"|([^,]*))(?:[,])");

static long currentsystime=System.currentTimeMillis();

public static void testRead(String filepath) throws IOException

{

File file = new File(filepath);

FileFilter fileFilter= new FileFilter() {

@Override

public boolean accept(File pathname) {

// TODO Auto-generated method stub

int timediffinhr=(int) ((currentsystime-pathname.lastModified())/86400000);

if(timediffinhr<10 && pathname.getName().endsWith(".csv")&& pathname.getName().contains(opname)){

return true;

}

else

return false;

}

};

File[] listoffiles= file.listFiles(fileFilter);

long time= System.currentTimeMillis();

for(File mf:listoffiles){

Matcher matcher=pattern1.matcher(mf.getName());

matcher.find();

//String timestamp=mf.getName().split("~")[5].replace(".csv", "");

String timestamp=matcher.group();

BufferedReader br= new BufferedReader(new FileReader(mf));

String line;

Map tmp=store.containsKey(timestamp)?store.get(timestamp):new HashMap();

while((line=br.readLine())!=null)

{

matcher=pattern2.matcher(line);

matcher.find();matcher.find();matcher.find();matcher.find();matcher.find();matcher.find();matcher.find();

//String severity=line.split(",")[6];

String severity=matcher.group();

if(!severity.equals("NORMAL"))

{

tmp.put(severity, tmp.containsKey(severity)?tmp.get(severity)+1:1);

}

}

br.close();

store.put(timestamp, tmp);

}

time=System.currentTimeMillis()-time;

//System.out.println(time+"ms");

//System.out.println(store);

}

public static void main(String[] args) throws IOException

{

//opname = args[0];

for(int i=0;i<20;i++){

long time= System.currentTimeMillis();

testRead("./SMF/data/analyser/archive");

time=System.currentTimeMillis()-time;

System.out.println("Time taken for "+i+" is "+time+"ms");

}

}

}

但是我现在还有一个问题,

在小型数据集上运行时查看结果.

**Time taken for 0 is 218ms

Time taken for 1 is 134ms

Time taken for 2 is 127ms**

Time taken for 3 is 98ms

Time taken for 4 is 90ms

Time taken for 5 is 77ms

Time taken for 6 is 71ms

Time taken for 7 is 72ms

Time taken for 8 is 62ms

Time taken for 9 is 57ms

Time taken for 10 is 53ms

Time taken for 11 is 58ms

Time taken for 12 is 59ms

Time taken for 13 is 46ms

Time taken for 14 is 44ms

Time taken for 15 is 45ms

Time taken for 16 is 53ms

Time taken for 17 is 45ms

Time taken for 18 is 61ms

Time taken for 19 is 42ms

首先,花费的时间更多,然后减少.

为什么???

谢谢 ,

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值