java.lang.String.split与org.apache.commons.lang3.StringUtils.split性能对比

最新推荐文章于 2022-05-14 15:25:01 发布

慕姊伯

最新推荐文章于 2022-05-14 15:25:01 发布

阅读量440

点赞数 1

分类专栏： Java相关

本文链接：https://blog.csdn.net/qq_41974251/article/details/119857565

版权

Java 字符串切分性能测试正则表达式 Apache Commons Lang

关键词由CSDN通过智能技术生成

Java相关专栏收录该内容

4 篇文章 2 订阅

订阅专栏

文件数据如下：

1. csv(85kb)

2.csv(186.5M)

测试结果：

按如下字符切分		1.csv	2.csv	1.csv	2.csv
,	java.lang.String.split	22	1763	15	1596
,	org.apache.commons.lang3.StringUtils.split	43	1340	38	1402
0.	java.lang.String.split	18	2542	17	2449
0.	org.apache.commons.lang3.StringUtils.split	46	1749	72	2280
0.0	java.lang.String.split	28	2094	25	2020
0.0	org.apache.commons.lang3.StringUtils.split	54	2583	67	2140
0.01721102	java.lang.String.split	33	1661	46	1697
0.01721102	org.apache.commons.lang3.StringUtils.split	54	3554	54	2897
20个以上多个字符	java.lang.String.split	16	1677	30	2352
20个以上多个字符	org.apache.commons.lang3.StringUtils.split	64	2252	71	2009

结论：

如果切分字段比较短，且被切分字符串比较短的时候，优先采用：java.lang.String.split
如果切分字段比较短，但是被切分的字符串比较长的时候，优先采用org.apache.commons.lang3.StringUtils.split
如果切分字段比较长，且被切分字符串比较短的时候，优先采用java.lang.String.split
如果切分字段比较长，且悲切分的字符串比较长的时候，两者差距不是很大，但是有推荐使用：java.lang.String.split

代码分析：

java.lang.String.split

    public String[] split(String regex, int limit) {
        /* fastpath if the regex is a
         (1)one-char String and this character is not one of the
            RegEx's meta characters ".$|()[{^?*+\\", or
         (2)two-char String and the first char is the backslash and
            the second is not the ascii digit or ascii letter.
         */
        char ch = 0;
        // 正则校验
        if (((regex.value.length == 1 &&
             ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
             (regex.length() == 2 &&
              regex.charAt(0) == '\\' &&
              (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
              ((ch-'a')|('z'-ch)) < 0 &&
              ((ch-'A')|('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
             ch > Character.MAX_LOW_SURROGATE))
        {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {    // last one
                    //assert (list.size() == limit - 1);
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
            // If no match was found, return this
            if (off == 0)
                return new String[]{this};

            // Add remaining segment
            if (!limited || list.size() < limit)
                list.add(substring(off, value.length));

            // Construct result
            int resultSize = list.size();
            if (limit == 0) {
                while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        return Pattern.compile(regex).split(this, limit);
    }

org.apache.commons.lang3.StringUtils.split

    private static String[] splitWorker(final String str, final String separatorChars, final int max, final boolean preserveAllTokens) {
        // Performance tuned for 2.0 (JDK1.4)
        // Direct code is quicker than StringTokenizer.
        // Also, StringTokenizer uses isSpace() not isWhitespace()

        if (str == null) {
            return null;
        }
        final int len = str.length();
        if (len == 0) {
            return ArrayUtils.EMPTY_STRING_ARRAY;
        }
        final List<String> list = new ArrayList<>();
        int sizePlus1 = 1;
        int i = 0, start = 0;
        boolean match = false;
        boolean lastMatch = false;
        if (separatorChars == null) {
            // Null separator means use whitespace
            while (i < len) {
                if (Character.isWhitespace(str.charAt(i))) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        } else if (separatorChars.length() == 1) {
            // Optimise 1 character case
            final char sep = separatorChars.charAt(0);
            // 通过循环i
            while (i < len) {
                if (str.charAt(i) == sep) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        } else {
            // standard case
            while (i < len) {
                if (separatorChars.indexOf(str.charAt(i)) >= 0) {
                    if (match || preserveAllTokens) {
                        lastMatch = true;
                        if (sizePlus1++ == max) {
                            i = len;
                            lastMatch = false;
                        }
                        list.add(str.substring(start, i));
                        match = false;
                    }
                    start = ++i;
                    continue;
                }
                lastMatch = false;
                match = true;
                i++;
            }
        }
        if (match || preserveAllTokens && lastMatch) {
            list.add(str.substring(start, i));
        }
        return list.toArray(ArrayUtils.EMPTY_STRING_ARRAY);
    }

源码阅读结论：

java.lang.String.split：有正则校验，但是如果你只是使用字符进行切分而非正则切分，其实是用不到的，所以有耗时
org.apache.commons.lang3.StringUtils.split：虽然没有正则相关判断，但是采用的是i++进行迭代循环，导致性能不高，没有采用类似于java.lang.String.split中直接获取该字符下标的方式进行截取。

代码：

{

    public static void main(String[] args) {

        String path1 = "/Users/Desktop/csdn/1.csv";
        String path2 = "/Users/Desktop/csdn/2.csv";
       

        String splitStr = ",";

        String path = path1;

        Long split_start_0 = System.currentTimeMillis();
        getFileInfo(path, 3, splitStr);
        Long split_end_0 = System.currentTimeMillis();
        System.out.println("java.lang.String:" + path + ":" + +(split_end_0 - split_start_0));

    }


    public static StringBuilder getFileInfo(String path, Integer choose, String splitStr) {
        StringBuilder sb = new StringBuilder();
        File file = new File(path);
        if (!file.exists()) {
            System.out.printf("当前文件不存在！！");
            return sb;
        }
        FileReader fileReader = null;
        BufferedReader bufferedReader = null;
        try {
            fileReader = new FileReader(file);
            bufferedReader = new BufferedReader(fileReader);
            String line;
            while ((line = bufferedReader.readLine()) != null) {
                sb.append(line);

                String[] strArrays;
                if (1 == choose) {
                    strArrays = line.split(splitStr);
                } else if (2 == choose) {
                    strArrays = StringUtils.split(line, splitStr);
                } else {
                    strArrays = org.springframework.util.StringUtils.split(line, splitStr);

                }

                for (int i = 0; i < strArrays.length; i++) {
                    System.out.print(strArrays[i]);
                    System.out.print(" ");
                }
                System.out.println();
            }

        } catch (Exception e) {
            System.out.printf("发生异常");
        }

        sb = null;
//        sb = new StringBuilder("");

        return sb;
    }


}

终：

在成长阶段，以上全部为个人理解，如有不正确的地方还望指出，谢谢。

慕姊伯

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
java.lang.String.split与org.apache.commons.lang3.StringUtils.split性能对比

文件数据如下：1. csv(85kb)2.csv(186.5M)测试结果：按如下字符切分 1.csv 2.csv 1.csv 2.csv , java.lang.String.split 22 1763 15 1596 org.apache.commons.lang3.StringUtils.split 43 1340 38 1402 0. java.lang.String.split
复制链接

扫一扫

专栏目录