java stringtokenizer_Java中StringTokenizer类与String.split方法的性能

25c2257a0057458300c81719ff7a2830.png

动漫人物

如果您的数据已经存在于数据库中,则需要分析字符串,我建议重复使用indexOf。它比任何一种解决方案都要快许多倍。但是,从数据库获取数据仍然可能要昂贵得多。StringBuilder sb = new StringBuilder();for (int i = 100000; i < 100000 + 60; i++)    sb.append(i).append(' ');String sample = sb.toString();int runs = 100000;for (int i = 0; i < 5; i++) {    {        long start = System.nanoTime();        for (int r = 0; r < runs; r++) {            StringTokenizer st = new StringTokenizer(sample);            List list = new ArrayList();            while (st.hasMoreTokens())                list.add(st.nextToken());        }        long time = System.nanoTime() - start;        System.out.printf("StringTokenizer took an average of %.1f us%n", time / runs / 1000.0);    }    {        long start = System.nanoTime();        Pattern spacePattern = Pattern.compile(" ");        for (int r = 0; r < runs; r++) {            List list = Arrays.asList(spacePattern.split(sample, 0));        }        long time = System.nanoTime() - start;        System.out.printf("Pattern.split took an average of %.1f us%n", time / runs / 1000.0);    }    {        long start = System.nanoTime();        for (int r = 0; r < runs; r++) {            List list = new ArrayList();            int pos = 0, end;            while ((end = sample.indexOf(' ', pos)) >= 0) {                list.add(sample.substring(pos, end));                pos = end + 1;            }        }        long time = System.nanoTime() - start;        System.out.printf("indexOf loop took an average of %.1f us%n", time / runs / 1000.0);    } }版画StringTokenizer took an average of 5.8 usPattern.split took an average of 4.8 usindexOf loop took an average of 1.8 usStringTokenizer took an average of 4.9 usPattern.split took an average of 3.7 usindexOf loop took an average of 1.7 usStringTokenizer took an average of 5.2 usPattern.split took an average of 3.9 usindexOf loop took an average of 1.8 usStringTokenizer took an average of 5.1 usPattern.split took an average of 4.1 usindexOf loop took an average of 1.6 usStringTokenizer took an average of 5.0 usPattern.split took an average of 3.8 usindexOf loop took an average of 1.6 us打开文件的成本约为8毫秒。由于文件太小,因此缓存可以将性能提高2-5倍。即使如此,它仍要花费大约10个小时来打开文件。每个使用split vs StringTokenizer的成本都远远小于0.01 ms。解析1900万x 30个单词*每个单词8个字母大约需要10秒(每2秒大约1 GB)如果您想提高性能,建议您使用更少的文件。例如使用数据库。如果您不想使用SQL数据库,建议您使用以下一种http://nosql-database.org/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值