String.split(),StringTokenizer,StringUtils.split()的源码分析与性能比较

最新推荐文章于 2023-08-15 08:30:00 发布

zhangshk_

最新推荐文章于 2023-08-15 08:30:00 发布

阅读量1.1k

点赞数

分类专栏： java基础

本文链接：https://blog.csdn.net/zhangshk_/article/details/82685612

版权

java基础专栏收录该内容

13 篇文章 0 订阅

订阅专栏

他们都是用来对字符串进行切割

String.split()：

从jdk1.4开始，通过正则表达式进行字符串的匹配切割，有可能会抛出 PatternSyntaxException异常，返回的是切割之后的字符串数组。

String 的split 有两个重载的方法分别是

public String[] split(String regex, int limit) {

public String[] split(String regex) {  如果选择只有regex这个参数的方法，默认会调用limit=0的上面的重载方法。

这俩有啥区别呢？看下limit的文档解释：

The {@code limit} parameter controls the number of times the
* pattern is applied and therefore affects the length of the resulting
* array.  If the limit  is greater than zero then the pattern
* will be applied at most n-1 times, the array's
* length will be no greater than n, and the array's last entry
* will contain all input beyond the last matched delimiter.  If n
* is non-positive then the pattern will be applied as many times as
* possible and the array can have any length.  If n is zero then
* the pattern will be applied as many times as possible, the array can
* have any length, and trailing empty strings will be discarded.

首先limit的作用是用来控制切割次数的；

如果n>0，会通过这个pattern 对字符串切割n-1次

如果n<0，会通过这个pattern 对字符串切割尽可能的多次，有多少切多少。

如果n=0，会通过这个pattern 对字符串切割尽可能的多次，但是，如果后面都是空字符串的话，会抛弃掉

下面我们debug看一下不同limit的值的切割过程：

查看split源码发现，split是根据regex 通过正则来匹配字符串的，正则表达式必然涉及到regex的编译，这其实是很耗时的。

StringUtils.split()

StringUtils的split有四个重载方法：

这个是按照空格进行切割的，如果字符串为null的话，返回的是null，如果字符串为空的话，返回是空数组。

public static String[] split(final String str) {
    return split(str, null, -1);
}

* StringUtils.split(null)       = null
* StringUtils.split("")         = []
* StringUtils.split("abc def")  = ["abc", "def"]
* StringUtils.split("abc  def") = ["abc", "def"]
* StringUtils.split(" abc ")    = ["abc"]

这个是按照指定的字符进行切割，separatorChar默认为false ，这个不可以指定因为默认调用的私有方法splitWorker中参数separatorChar默认已经指定为为false，separatorChar为false表示相邻的分隔符，作为一个，如果为true，分隔符将被作为空分割符。

public static String[] split(final String str, final char separatorChar) {
    return splitWorker(str, separatorChar, false);
}

* StringUtils.split(null, *)         = null
* StringUtils.split("", *)           = []
* StringUtils.split("a.b.c", '.')    = ["a", "b", "c"]
* StringUtils.split("a..b.c", '.')   = ["a", "b", "c"]
* StringUtils.split("a:b:c", '.')    = ["a:b:c"]
* StringUtils.split("a b c", ' ')    = ["a", "b", "c"]

这个是按照指定的字符串进行切割，separatorChar默认为false ，注意，如果分隔符为null，则分隔符等同于“ ”

public static String[] split(final String str, final String separatorChars) {
    return splitWorker(str, separatorChars, -1, false);
}

* StringUtils.split(null, *)         = null
* StringUtils.split("", *)           = []
* StringUtils.split("abc def", null) = ["abc", "def"]
* StringUtils.split("abc def", " ")  = ["abc", "def"]
* StringUtils.split("abc  def", " ") = ["abc", "def"]
* StringUtils.split("ab:cd:ef", ":") = ["ab", "cd", "ef"]

这个是按照指定的字符串进行切割，separatorChar默认为false ，注意，如果分隔符为null，则分隔符等同于“ ”，参数max用于指定返回的array中最多包含几个元素。

public static String[] split(final String str, final String separatorChars, final int max) {
    return splitWorker(str, separatorChars, max, false);
}

* StringUtils.split(null, *, *)            = null
* StringUtils.split("", *, *)              = []
* StringUtils.split("ab cd ef", null, 0)   = ["ab", "cd", "ef"]
* StringUtils.split("ab   cd ef", null, 0) = ["ab", "cd", "ef"]
* StringUtils.split("ab:cd:ef", ":", 0)    = ["ab", "cd", "ef"]
* StringUtils.split("ab:cd:ef", ":", 2)    = ["ab", "cd:ef"]

StringTokenizer

其实看StringUtils的split的源码解释的时候，发现第一句就是

* <p>Splits the provided text into an array, separators specified.
* This is an alternative to using StringTokenizer.</p>

所以说StringUtils 的split 是 StringTokenizer 的替代品。他们的用法完全可以相互实现。

StringTokenizer 有三个重载方法：

这个是按照默认的分隔符进行切割，默认有" \t\n\r\f"

public StringTokenizer(String str) {
    this(str, " \t\n\r\f", false);
}

这个是按照指定的分隔符delim进行切割

public StringTokenizer(String str, String delim) {
    this(str, delim, false);
}

这个是按照指定的分隔符进行切割，returnDelims 表示是否返回分隔符，并初始化一些参数

public StringTokenizer(String str, String delim, boolean returnDelims) {
    currentPosition = 0;
    newPosition = -1;
    delimsChanged = false;
    this.str = str;
    maxPosition = str.length();
    delimiters = delim;
    retDelims = returnDelims;
    setMaxDelimCodePoint();
}

下面是StringTokenizer 的demo