java.lang.String.split与org.apache.commons.lang3.StringUtils.split性能对比

文件数据如下:

1. csv(85kb) 

2.csv(186.5M)

测试结果:

按如下字符切分1.csv2.csv1.csv2.csv
,java.lang.String.split221763151596
org.apache.commons.lang3.StringUtils.split431340381402
0.java.lang.String.split182542172449
org.apache.commons.lang3.StringUtils.split461749722280
0.0java.lang.String.split282094252020
org.apache.commons.lang3.StringUtils.split542583672140
0.01721102java.lang.String.split331661461697
org.apache.commons.lang3.StringUtils.split543554542897
20个以上多个字符java.lang.String.split161677302352
org.apache.commons.lang3.StringUtils.split642252712009

结论:

  1. 如果切分字段比较短,且被切分字符串比较短的时候,优先采用:java.lang.String.split
  2. 如果切分字段比较短,但是被切分的字符串比较长的时候,优先采用org.apache.commons.lang3.StringUtils.split
  3. 如果切分字段比较长,且被切分字符串比较短的时候,优先采用java.lang.String.split
  4. 如果切分字段比较长,且悲切分的字符串比较长的时候,两者差距不是很大,但是有推荐使用:java.lang.String.split

代码分析:

java.lang.String.split

    public String[] split(String regex, int limit) {
        /* fastpath if the regex is a
         (1)one-char String and this character is not one of the
            RegEx's meta characters ".$|()[{^?*+\\", or
         (2)two-char String and the first char is the backslash and
            the second is not the ascii digit or ascii letter.
         */
        char ch = 0;
        // 正则校验
        if (((regex.value.length == 1 &&
             ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
             (regex.length() == 2 &&
              regex.charAt(0) == '\\' &&
              (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
              ((ch-'a')|('z'-ch)) < 0 &&
              ((ch-'A')|('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
             ch > Character.MAX_LOW_SURROGATE))
        {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {    // last one
                    //assert (list.size() == limit - 1);
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
            // If no match was found, return this
            if (off == 0)
                return new String[]{this};

            // Add remaining segment
            if (!limited || list.size() < limit)
                list.add(substring(off, value.length));

            // Construct result
            int resultSize = list.size();
            if (limit == 0) {
                while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        return Pattern.compile(regex).split(this, limit);
    }

org.apache.commons.lang3.StringUtils.split

    private static String[] splitWorker(final String str, final String separatorChars, final int max, final boolean preserveAllTokens) {
        // Performance tuned for 2.0 (JDK1.4)
        // Direct code is quicker than StringTokenizer.
        // Also, StringTokenizer uses isSpace() not isWhitespace()

        if (str == null) {
            return null;
        }
        final int len = str.length();
        if (len == 0) {
            return ArrayUtils.EMPTY_STRING_ARRAY;
        }
        final List<String> list = new ArrayList<>();
        int sizePlus1 = 1;
        int i = 0, start = 0;
        boolean match = false;
        boolean lastMatch = false;
        if (separatorChars == null) {
            // Null separator means use whitespace
            while (i < len) {
                if (Character.isWhitespace(str.charAt(i))) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        } else if (separatorChars.length() == 1) {
            // Optimise 1 character case
            final char sep = separatorChars.charAt(0);
            // 通过循环i
            while (i < len) {
                if (str.charAt(i) == sep) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        } else {
            // standard case
            while (i < len) {
                if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        }
        if (match || preserveAllTokens && lastMatch) {
            list.add(str.substring(start, i));
        }
        return list.toArray(ArrayUtils.EMPTY_STRING_ARRAY);
    }

源码阅读结论:

  1. java.lang.String.split:有正则校验,但是如果你只是使用字符进行切分而非正则切分,其实是用不到的,所以有耗时
  2. org.apache.commons.lang3.StringUtils.split:虽然没有正则相关判断,但是采用的是i++进行迭代循环,导致性能不高,没有采用类似于java.lang.String.split中直接获取该字符下标的方式进行截取。

代码:

{

    public static void main(String[] args) {

        String path1 = "/Users/Desktop/csdn/1.csv";
        String path2 = "/Users/Desktop/csdn/2.csv";
       

        String splitStr = ",";

        String path = path1;

        Long split_start_0 = System.currentTimeMillis();
        getFileInfo(path, 3, splitStr);
        Long split_end_0 = System.currentTimeMillis();
        System.out.println("java.lang.String:" + path + ":" + +(split_end_0 - split_start_0));

    }


    public static StringBuilder getFileInfo(String path, Integer choose, String splitStr) {
        StringBuilder sb = new StringBuilder();
        File file = new File(path);
        if (!file.exists()) {
            System.out.printf("当前文件不存在!!");
            return sb;
        }
        FileReader fileReader = null;
        BufferedReader bufferedReader = null;
        try {
            fileReader = new FileReader(file);
            bufferedReader = new BufferedReader(fileReader);
            String line;
            while ((line = bufferedReader.readLine()) != null) {
                sb.append(line);

                String[] strArrays;
                if (1 == choose) {
                    strArrays = line.split(splitStr);
                } else if (2 == choose) {
                    strArrays = StringUtils.split(line, splitStr);
                } else {
                    strArrays = org.springframework.util.StringUtils.split(line, splitStr);

                }

                for (int i = 0; i < strArrays.length; i++) {
                    System.out.print(strArrays[i]);
                    System.out.print(" ");
                }
                System.out.println();
            }

        } catch (Exception e) {
            System.out.printf("发生异常");
        }

        sb = null;
//        sb = new StringBuilder("");

        return sb;
    }


}

终:

在成长阶段,以上全部为个人理解,如有不正确的地方还望指出,谢谢。

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值