项目介绍
许可证:Apache License 2.0
文档:Lang – Home (apache.org)https://commons.apache.org/proper/commons-lang/
javadoc:Overview (Apache Commons Lang 3.13.0 API)https://commons.apache.org/proper/commons-lang/apidocs/
社区:Apache Commons – Apache Commons Maillisthttps://commons.apache.org/mail-lists.html
代码信息介绍:Apache Commons Lang, a package of Java utility classes for the classes that are in java.lang's hierarchy, or are considered to be so standard as to justify existence in java.lang.
代码获取(github):apache/commons-lang: Apache Commons Langhttps://github.com/apache/commons-lang
基本介绍:Apache Commons Lang是对java.lang的扩展,基本上是commons中最常用的工具包,其中有字符串、列表、数学、异常等等便于使用的实用方法,将这些方法引入项目中可以简化我们代码中很多繁琐的底层逻辑。
项目架构:Maven
引入项目的方式(pom.xml):
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.13.0</version>
</dependency>
测试框架:JUnit
代码阅读(StringUtils.java split系列方法)
该部分位于包org.apache.commons.lang3中的StringUtils类(字符串实用类)。
在仓库中的位置为commons-lang-master\src\main\java\org\apache\commons\lang3\StringUtils.java,7248-7949行。
split()方法介绍:将字符串按照分隔符分割为字符串数组。例如,split(“hello world hello apache”)=[“hello”, “world”, “hello”, “apache”]
基本介绍
速览
以下是该系列一共18个方法的速览(省略实现逻辑)
public static String[] split(final String str) {
return split(str, null, -1);
}
public static String[] split(final String str, final char separatorChar) {
return splitWorker(str, separatorChar, false);
}
public static String[] split(final String str, final String separatorChars) {
return splitWorker(str, separatorChars, -1, false);
}
public static String[] split(final String str, final String separatorChars, final int max) {
return splitWorker(str, separatorChars, max, false);
}
public static String[] splitByCharacterType(final String str) {
return splitByCharacterType(str, false);
}
private static String[] splitByCharacterType(final String str, final boolean camelCase) {
...
}
public static String[] splitByCharacterTypeCamelCase(final String str) {
return splitByCharacterType(str, true);
}
public static String[] splitByWholeSeparator(final String str, final String separator) {
return splitByWholeSeparatorWorker(str, separator, -1, false);
}
public static String[] splitByWholeSeparator( final String str, final String separator, final int max) {
return splitByWholeSeparatorWorker(str, separator, max, false);
}
public static String[] splitByWholeSeparatorPreserveAllTokens(final String str, final String separator) {
return splitByWholeSeparatorWorker(str, separator, -1, true);
}
public static String[] splitByWholeSeparatorPreserveAllTokens(final String str, final String separator, final int max) {
return splitByWholeSeparatorWorker(str, separator, max, true);
}
private static String[] splitByWholeSeparatorWorker(final String str, final String separator, final int max, final boolean preserveAllTokens) {
...
if (separator == null || EMPTY.equals(separator)) {
// Split on whitespace.
return splitWorker(str, null, max, preserveAllTokens);
}
...
}
public static String[] splitPreserveAllTokens(final String str) {
return splitWorker(str, null, -1, true);
}
public static String[] splitPreserveAllTokens(final String str, final char separatorChar) {
return splitWorker(str, separatorChar, true);
}
public static String[] splitPreserveAllTokens(final String str, final String separatorChars) {
return splitWorker(str, separatorChars, -1, true);
}
public static String[] splitPreserveAllTokens(final String str, final String separatorChars, final int max) {
return splitWorker(str, separatorChars, max, true);
}
private static String[] splitWorker(final String str, final char separatorChar, final boolean preserveAllTokens) {
...
}
private static String[] splitWorker(final String str, final String separatorChars, final int max, final boolean preserveAllTokens) {
...
}
参数解释
String str—被分割的字符串。
boolean camelCase—是否按照“驼峰”方式分割字符串,即将大写字母和后面连续的小写字母视为一个令牌(token,或者叫分割单元),仅出现在splitByCharacterType方法中。
String separator—字符串形式的分隔符。如传入null,则视作分隔符为空白符,空白符为使得Character.isWhitespace(char)返回true的字符。
char separatorChar—字符形式的分隔符。
String separatorChars—字符形式的分隔符,多种字符分隔符组成的字符串。如传入null,则视作分隔符为空白符。
int max—返回的String[]数组的最大长度,如达到最大长度仍未分割完,则数组的最后一个元素的内容为str剩余未分割的子串。如传入-1,则视为不设长度上限。
boolean preserveAllTokens—是否保留空的令牌(空的分割单元),空的令牌是指相邻的分隔符之间的空串。(if true, adjacent separators are treated as empty token separators; if false, adjacent separators are treated as one separator.)。
代码结构特点
该系列一共18个方法,调用关系看似复杂,实则真正实现具体逻辑的只有4个方法,且权限均为private,其他方法均复用以下4个方法:
private static String[] splitByCharacterType(final String str, final boolean camelCase);
private static String[] splitByWholeSeparatorWorker(final String str, final String separator, final int max, final boolean preserveAllTokens);
private static String[] splitWorker(final String str, final char separatorChar, final boolean preserveAllTokens);
private static String[] splitWorker(final String str, final String separatorChars, final int max, final boolean preserveAllTokens);
在调用逻辑较为复杂的情况,可以将实现真正功能的方法的权限设为private,方法名取为”xxxWorker()”,例如:
private static String[] splitWorker(final String str, final char separatorChar, final boolean preserveAllTokens) {
// Performance tuned for 2.0 (JDK1.4)
if (str == null) {
return null;
}
final int len = str.length();
if (len == 0) {
return ArrayUtils.EMPTY_STRING_ARRAY;
}
final List<String> list = new ArrayList<>();
int i = 0;
int start = 0;
boolean match = false;
boolean lastMatch = false;
while (i < len) {
if (str.charAt(i) == separatorChar) {
if (match || preserveAllTokens) {
list.add(str.substring(start, i));
match = false;
lastMatch = true;
}
start = ++i;
continue;
}
lastMatch = false;
match = true;
i++;
}
if (match || preserveAllTokens && lastMatch) {
list.add(str.substring(start, i));
}
return list.toArray(ArrayUtils.EMPTY_STRING_ARRAY);
}
其余开放(public)的方法只需简单调用这些方法(一句return),例如:
public static String[] split(final String str, final char separatorChar) {
return splitWorker(str, separatorChar, false);
}
这样做的好处:将项目的实现逻辑与复用逻辑分开,能够快速找到源码中负责实现真正逻辑的Worker方法,便于对项目的梳理。当我们阅读源码时,可以先观察其他方法对这些Worker方法的调用关系,再去详细阅读这些Worker方法的逻辑。
常使用final进行保护
Java中用来修饰属性或变量的final相当于c语言的const,它代表该变量不可修改(基本类型变量的值不可修改,引用类型变量的引用不可修改),如果进行了修改,则会在编译过程报错,也可被IDE检查出。这样就保护了这个变量,防止代码中的操作不小心修改了这个变量。
例如:
(1)修饰函数形参
public static String[] split(final String str);
public static String[] split(final String str, final char separatorChar);
public static String[] split(final String str, final String separatorChars);
public static String[] split(final String str, final String separatorChars, final int max);
public static String[] splitByCharacterType(final String str);
private static String[] splitByCharacterType(final String str, final boolean camelCase);
public static String[] splitByCharacterTypeCamelCase(final String str);
public static String[] splitByWholeSeparator(final String str, final String separator);
(2)修饰局部变量
private static String[] splitByCharacterType(final String str, final boolean camelCase) {
...
final char[] c = str.toCharArray();
final List<String> list = new ArrayList<>();
int tokenStart = 0;
int currentType = Character.getType(c[tokenStart]);
for (int pos = tokenStart + 1; pos < c.length; pos++) {
final int type = Character.getType(c[pos]);
if (type == currentType) {
continue;
}
...
}
注释规范(Javadoc)
在编写代码注释时,可以遵循Javadoc注释规范。
代码实践:词频统计
该简单项目使用Maven架构,引入Apache_Commons_lang依赖,仿照Apache_Commons_lang代码结构编写了词频统计功能的系列方法,包括以上提到的Worker方法、final保护、注释规范,并使用JUnit编写一个测试类测试其正确性。
附录:该部分完整源码
在线阅读:
/**
* Splits the provided text into an array, using whitespace as the
* separator.
* Whitespace is defined by {@link Character#isWhitespace(char)}.
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as one separator.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A {@code null} input String returns {@code null}.</p>
*
* <pre>
* StringUtils.split(null) = null
* StringUtils.split("") = []
* StringUtils.split("abc def") = ["abc", "def"]
* StringUtils.split("abc def") = ["abc", "def"]
* StringUtils.split(" abc ") = ["abc"]
* </pre>
*
* @param str the String to parse, may be null
* @return an array of parsed Strings, {@code null} if null String input
*/
public static String[] split(final String str) {
return split(str, null, -1);
}
/**
* Splits the provided text into an array, separator specified.
* This is an alternative to using StringTokenizer.
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as one separator.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A {@code null} input String returns {@code null}.</p>
*
* <pre>
* StringUtils.split(null, *) = null
* StringUtils.split("", *) = []
* StringUtils.split("a.b.c", '.') = ["a", "b", "c"]
* StringUtils.split("a..b.c", '.') = ["a", "b", "c"]
* StringUtils.split("a:b:c", '.') = ["a:b:c"]
* StringUtils.split("a b c", ' ') = ["a", "b", "c"]
* </pre>
*
* @param str the String to parse, may be null
* @param separatorChar the character used as the delimiter
* @return an array of parsed Strings, {@code null} if null String input
* @since 2.0
*/
public static String[] split(final String str, final char separatorChar) {
return splitWorker(str, separatorChar, false);
}
/**
* Splits the provided text into an array, separators specified.
* This is an alternative to using StringTokenizer.
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as one separator.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A {@code null} input String returns {@code null}.
* A {@code null} separatorChars splits on whitespace.</p>
*
* <pre>
* StringUtils.split(null, *) = null
* StringUtils.split("", *) = []
* StringUtils.split("abc def", null) = ["abc", "def"]
* StringUtils.split("abc def", " ") = ["abc", "def"]
* StringUtils.split("abc def", " ") = ["abc", "def"]
* StringUtils.split("ab:cd:ef", ":") = ["ab", "cd", "ef"]
* </pre>
*
* @param str the String to parse, may be null
* @param separatorChars the characters used as the delimiters,
* {@code null} splits on whitespace
* @return an array of parsed Strings, {@code null} if null String input
*/
public static String[] split(final String str, final String separatorChars) {
return splitWorker(str, separatorChars, -1, false);
}
/**
* Splits the provided text into an array with a maximum length,
* separators specified.
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as one separator.</p>
*
* <p>A {@code null} input String returns {@code null}.
* A {@code null} separatorChars splits on whitespace.</p>
*
* <p>If more than {@code max} delimited substrings are found, the last
* returned string includes all characters after the first {@code max - 1}
* returned strings (including separator characters).</p>
*
* <pre>
* StringUtils.split(null, *, *) = null
* StringUtils.split("", *, *) = []
* StringUtils.split("ab cd ef", null, 0) = ["ab", "cd", "ef"]
* StringUtils.split("ab cd ef", null, 0) = ["ab", "cd", "ef"]
* StringUtils.split("ab:cd:ef", ":", 0) = ["ab", "cd", "ef"]
* StringUtils.split("ab:cd:ef", ":", 2) = ["ab", "cd:ef"]
* </pre>
*
* @param str the String to parse, may be null
* @param separatorChars the characters used as the delimiters,
* {@code null} splits on whitespace
* @param max the maximum number of elements to include in the
* array. A zero or negative value implies no limit
* @return an array of parsed Strings, {@code null} if null String input
*/
public static String[] split(final String str, final String separatorChars, final int max) {
return splitWorker(str, separatorChars, max, false);
}
/**
* Splits a String by Character type as returned by
* {@code java.lang.Character.getType(char)}. Groups of contiguous
* characters of the same type are returned as complete tokens.
* <pre>
* StringUtils.splitByCharacterType(null) = null
* StringUtils.splitByCharacterType("") = []
* StringUtils.splitByCharacterType("ab de fg") = ["ab", " ", "de", " ", "fg"]
* StringUtils.splitByCharacterType("ab de fg") = ["ab", " ", "de", " ", "fg"]
* StringUtils.splitByCharacterType("ab:cd:ef") = ["ab", ":", "cd", ":", "ef"]
* StringUtils.splitByCharacterType("number5") = ["number", "5"]
* StringUtils.splitByCharacterType("fooBar") = ["foo", "B", "ar"]
* StringUtils.splitByCharacterType("foo200Bar") = ["foo", "200", "B", "ar"]
* StringUtils.splitByCharacterType("ASFRules") = ["ASFR", "ules"]
* </pre>
* @param str the String to split, may be {@code null}
* @return an array of parsed Strings, {@code null} if null String input
* @since 2.4
*/
public static String[] splitByCharacterType(final String str) {
return splitByCharacterType(str, false);
}
/**
* <p>Splits a String by Character type as returned by
* {@code java.lang.Character.getType(char)}. Groups of contiguous
* characters of the same type are returned as complete tokens, with the
* following exception: if {@code camelCase} is {@code true},
* the character of type {@code Character.UPPERCASE_LETTER}, if any,
* immediately preceding a token of type {@code Character.LOWERCASE_LETTER}
* will belong to the following token rather than to the preceding, if any,
* {@code Character.UPPERCASE_LETTER} token.
* @param str the String to split, may be {@code null}
* @param camelCase whether to use so-called "camel-case" for letter types
* @return an array of parsed Strings, {@code null} if null String input
* @since 2.4
*/
private static String[] splitByCharacterType(final String str, final boolean camelCase) {
if (str == null) {
return null;
}
if (str.isEmpty()) {
return ArrayUtils.EMPTY_STRING_ARRAY;
}
final char[] c = str.toCharArray();
final List<String> list = new ArrayList<>();
int tokenStart = 0;
int currentType = Character.getType(c[tokenStart]);
for (int pos = tokenStart + 1; pos < c.length; pos++) {
final int type = Character.getType(c[pos]);
if (type == currentType) {
continue;
}
if (camelCase && type == Character.LOWERCASE_LETTER && currentType == Character.UPPERCASE_LETTER) {
final int newTokenStart = pos - 1;
if (newTokenStart != tokenStart) {
list.add(new String(c, tokenStart, newTokenStart - tokenStart));
tokenStart = newTokenStart;
}
} else {
list.add(new String(c, tokenStart, pos - tokenStart));
tokenStart = pos;
}
currentType = type;
}
list.add(new String(c, tokenStart, c.length - tokenStart));
return list.toArray(ArrayUtils.EMPTY_STRING_ARRAY);
}
/**
* <p>Splits a String by Character type as returned by
* {@code java.lang.Character.getType(char)}. Groups of contiguous
* characters of the same type are returned as complete tokens, with the
* following exception: the character of type
* {@code Character.UPPERCASE_LETTER}, if any, immediately
* preceding a token of type {@code Character.LOWERCASE_LETTER}
* will belong to the following token rather than to the preceding, if any,
* {@code Character.UPPERCASE_LETTER} token.
* <pre>
* StringUtils.splitByCharacterTypeCamelCase(null) = null
* StringUtils.splitByCharacterTypeCamelCase("") = []
* StringUtils.splitByCharacterTypeCamelCase("ab de fg") = ["ab", " ", "de", " ", "fg"]
* StringUtils.splitByCharacterTypeCamelCase("ab de fg") = ["ab", " ", "de", " ", "fg"]
* StringUtils.splitByCharacterTypeCamelCase("ab:cd:ef") = ["ab", ":", "cd", ":", "ef"]
* StringUtils.splitByCharacterTypeCamelCase("number5") = ["number", "5"]
* StringUtils.splitByCharacterTypeCamelCase("fooBar") = ["foo", "Bar"]
* StringUtils.splitByCharacterTypeCamelCase("foo200Bar") = ["foo", "200", "Bar"]
* StringUtils.splitByCharacterTypeCamelCase("ASFRules") = ["ASF", "Rules"]
* </pre>
* @param str the String to split, may be {@code null}
* @return an array of parsed Strings, {@code null} if null String input
* @since 2.4
*/
public static String[] splitByCharacterTypeCamelCase(final String str) {
return splitByCharacterType(str, true);
}
/**
* <p>Splits the provided text into an array, separator string specified.
*
* <p>The separator(s) will not be included in the returned String array.
* Adjacent separators are treated as one separator.</p>
*
* <p>A {@code null} input String returns {@code null}.
* A {@code null} separator splits on whitespace.</p>
*
* <pre>
* StringUtils.splitByWholeSeparator(null, *) = null
* StringUtils.splitByWholeSeparator("", *) = []
* StringUtils.splitByWholeSeparator("ab de fg", null) = ["ab", "de", "fg"]
* StringUtils.splitByWholeSeparator("ab de fg", null) = ["ab", "de", "fg"]
* StringUtils.splitByWholeSeparator("ab:cd:ef", ":") = ["ab", "cd", "ef"]
* StringUtils.splitByWholeSeparator("ab-!-cd-!-ef", "-!-") = ["ab", "cd", "ef"]
* </pre>
*
* @param str the String to parse, may be null
* @param separator String containing the String to be used as a delimiter,
* {@code null} splits on whitespace
* @return an array of parsed Strings, {@code null} if null String was input
*/
public static String[] splitByWholeSeparator(final String str, final String separator) {
return splitByWholeSeparatorWorker(str, separator, -1, false);
}
/**
* Splits the provided text into an array, separator string specified.
* Returns a maximum of {@code max} substrings.
*
* <p>The separator(s) will not be included in the returned String array.
* Adjacent separators are treated as one separator.</p>
*
* <p>A {@code null} input String returns {@code null}.
* A {@code null} separator splits on whitespace.</p>
*
* <pre>
* StringUtils.splitByWholeSeparator(null, *, *) = null
* StringUtils.splitByWholeSeparator("", *, *) = []
* StringUtils.splitByWholeSeparator("ab de fg", null, 0) = ["ab", "de", "fg"]
* StringUtils.splitByWholeSeparator("ab de fg", null, 0) = ["ab", "de", "fg"]
* StringUtils.splitByWholeSeparator("ab:cd:ef", ":", 2) = ["ab", "cd:ef"]
* StringUtils.splitByWholeSeparator("ab-!-cd-!-ef", "-!-", 5) = ["ab", "cd", "ef"]
* StringUtils.splitByWholeSeparator("ab-!-cd-!-ef", "-!-", 2) = ["ab", "cd-!-ef"]
* </pre>
*
* @param str the String to parse, may be null
* @param separator String containing the String to be used as a delimiter,
* {@code null} splits on whitespace
* @param max the maximum number of elements to include in the returned
* array. A zero or negative value implies no limit.
* @return an array of parsed Strings, {@code null} if null String was input
*/
public static String[] splitByWholeSeparator( final String str, final String separator, final int max) {
return splitByWholeSeparatorWorker(str, separator, max, false);
}
/**
* Splits the provided text into an array, separator string specified.
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as separators for empty tokens.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A {@code null} input String returns {@code null}.
* A {@code null} separator splits on whitespace.</p>
*
* <pre>
* StringUtils.splitByWholeSeparatorPreserveAllTokens(null, *) = null
* StringUtils.splitByWholeSeparatorPreserveAllTokens("", *) = []
* StringUtils.splitByWholeSeparatorPreserveAllTokens("ab de fg", null) = ["ab", "de", "fg"]
* StringUtils.splitByWholeSeparatorPreserveAllTokens("ab de fg", null) = ["ab", "", "", "de", "fg"]
* StringUtils.splitByWholeSeparatorPreserveAllTokens("ab:cd:ef", ":") = ["ab", "cd", "ef"]
* StringUtils.splitByWholeSeparatorPreserveAllTokens("ab-!-cd-!-ef", "-!-") = ["ab", "cd", "ef"]
* </pre>
*
* @param str the String to parse, may be null
* @param separator String containing the String to be used as a delimiter,
* {@code null} splits on whitespace
* @return an array of parsed Strings, {@code null} if null String was input
* @since 2.4
*/
public static String[] splitByWholeSeparatorPreserveAllTokens(final String str, final String separator) {
return splitByWholeSeparatorWorker(str, separator, -1, true);
}
/**
* Splits the provided text into an array, separator string specified.
* Returns a maximum of {@code max} substrings.
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as separators for empty tokens.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A {@code null} input String returns {@code null}.
* A {@code null} separator splits on whitespace.</p>
*
* <pre>
* StringUtils.splitByWholeSeparatorPreserveAllTokens(null, *, *) = null
* StringUtils.splitByWholeSeparatorPreserveAllTokens("", *, *) = []
* StringUtils.splitByWholeSeparatorPreserveAllTokens("ab de fg", null, 0) = ["ab", "de", "fg"]
* StringUtils.splitByWholeSeparatorPreserveAllTokens("ab de fg", null, 0) = ["ab", "", "", "de", "fg"]
* StringUtils.splitByWholeSeparatorPreserveAllTokens("ab:cd:ef", ":", 2) = ["ab", "cd:ef"]
* StringUtils.splitByWholeSeparatorPreserveAllTokens("ab-!-cd-!-ef", "-!-", 5) = ["ab", "cd", "ef"]
* StringUtils.splitByWholeSeparatorPreserveAllTokens("ab-!-cd-!-ef", "-!-", 2) = ["ab", "cd-!-ef"]
* </pre>
*
* @param str the String to parse, may be null
* @param separator String containing the String to be used as a delimiter,
* {@code null} splits on whitespace
* @param max the maximum number of elements to include in the returned
* array. A zero or negative value implies no limit.
* @return an array of parsed Strings, {@code null} if null String was input
* @since 2.4
*/
public static String[] splitByWholeSeparatorPreserveAllTokens(final String str, final String separator, final int max) {
return splitByWholeSeparatorWorker(str, separator, max, true);
}
/**
* Performs the logic for the {@code splitByWholeSeparatorPreserveAllTokens} methods.
*
* @param str the String to parse, may be {@code null}
* @param separator String containing the String to be used as a delimiter,
* {@code null} splits on whitespace
* @param max the maximum number of elements to include in the returned
* array. A zero or negative value implies no limit.
* @param preserveAllTokens if {@code true}, adjacent separators are
* treated as empty token separators; if {@code false}, adjacent
* separators are treated as one separator.
* @return an array of parsed Strings, {@code null} if null String input
* @since 2.4
*/
private static String[] splitByWholeSeparatorWorker(
final String str, final String separator, final int max, final boolean preserveAllTokens) {
if (str == null) {
return null;
}
final int len = str.length();
if (len == 0) {
return ArrayUtils.EMPTY_STRING_ARRAY;
}
if (separator == null || EMPTY.equals(separator)) {
// Split on whitespace.
return splitWorker(str, null, max, preserveAllTokens);
}
final int separatorLength = separator.length();
final ArrayList<String> substrings = new ArrayList<>();
int numberOfSubstrings = 0;
int beg = 0;
int end = 0;
while (end < len) {
end = str.indexOf(separator, beg);
if (end > -1) {
if (end > beg) {
numberOfSubstrings += 1;
if (numberOfSubstrings == max) {
end = len;
substrings.add(str.substring(beg));
} else {
// The following is OK, because String.substring( beg, end ) excludes
// the character at the position 'end'.
substrings.add(str.substring(beg, end));
// Set the starting point for the next search.
// The following is equivalent to beg = end + (separatorLength - 1) + 1,
// which is the right calculation:
beg = end + separatorLength;
}
} else {
// We found a consecutive occurrence of the separator, so skip it.
if (preserveAllTokens) {
numberOfSubstrings += 1;
if (numberOfSubstrings == max) {
end = len;
substrings.add(str.substring(beg));
} else {
substrings.add(EMPTY);
}
}
beg = end + separatorLength;
}
} else {
// String.substring( beg ) goes from 'beg' to the end of the String.
substrings.add(str.substring(beg));
end = len;
}
}
return substrings.toArray(ArrayUtils.EMPTY_STRING_ARRAY);
}
/**
* Splits the provided text into an array, using whitespace as the
* separator, preserving all tokens, including empty tokens created by
* adjacent separators. This is an alternative to using StringTokenizer.
* Whitespace is defined by {@link Character#isWhitespace(char)}.
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as separators for empty tokens.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A {@code null} input String returns {@code null}.</p>
*
* <pre>
* StringUtils.splitPreserveAllTokens(null) = null
* StringUtils.splitPreserveAllTokens("") = []
* StringUtils.splitPreserveAllTokens("abc def") = ["abc", "def"]
* StringUtils.splitPreserveAllTokens("abc def") = ["abc", "", "def"]
* StringUtils.splitPreserveAllTokens(" abc ") = ["", "abc", ""]
* </pre>
*
* @param str the String to parse, may be {@code null}
* @return an array of parsed Strings, {@code null} if null String input
* @since 2.1
*/
public static String[] splitPreserveAllTokens(final String str) {
return splitWorker(str, null, -1, true);
}
/**
* Splits the provided text into an array, separator specified,
* preserving all tokens, including empty tokens created by adjacent
* separators. This is an alternative to using StringTokenizer.
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as separators for empty tokens.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A {@code null} input String returns {@code null}.</p>
*
* <pre>
* StringUtils.splitPreserveAllTokens(null, *) = null
* StringUtils.splitPreserveAllTokens("", *) = []
* StringUtils.splitPreserveAllTokens("a.b.c", '.') = ["a", "b", "c"]
* StringUtils.splitPreserveAllTokens("a..b.c", '.') = ["a", "", "b", "c"]
* StringUtils.splitPreserveAllTokens("a:b:c", '.') = ["a:b:c"]
* StringUtils.splitPreserveAllTokens("a\tb\nc", null) = ["a", "b", "c"]
* StringUtils.splitPreserveAllTokens("a b c", ' ') = ["a", "b", "c"]
* StringUtils.splitPreserveAllTokens("a b c ", ' ') = ["a", "b", "c", ""]
* StringUtils.splitPreserveAllTokens("a b c ", ' ') = ["a", "b", "c", "", ""]
* StringUtils.splitPreserveAllTokens(" a b c", ' ') = ["", a", "b", "c"]
* StringUtils.splitPreserveAllTokens(" a b c", ' ') = ["", "", a", "b", "c"]
* StringUtils.splitPreserveAllTokens(" a b c ", ' ') = ["", a", "b", "c", ""]
* </pre>
*
* @param str the String to parse, may be {@code null}
* @param separatorChar the character used as the delimiter,
* {@code null} splits on whitespace
* @return an array of parsed Strings, {@code null} if null String input
* @since 2.1
*/
public static String[] splitPreserveAllTokens(final String str, final char separatorChar) {
return splitWorker(str, separatorChar, true);
}
/**
* Splits the provided text into an array, separators specified,
* preserving all tokens, including empty tokens created by adjacent
* separators. This is an alternative to using StringTokenizer.
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as separators for empty tokens.
* For more control over the split use the StrTokenizer class.</p>
*
* <p>A {@code null} input String returns {@code null}.
* A {@code null} separatorChars splits on whitespace.</p>
*
* <pre>
* StringUtils.splitPreserveAllTokens(null, *) = null
* StringUtils.splitPreserveAllTokens("", *) = []
* StringUtils.splitPreserveAllTokens("abc def", null) = ["abc", "def"]
* StringUtils.splitPreserveAllTokens("abc def", " ") = ["abc", "def"]
* StringUtils.splitPreserveAllTokens("abc def", " ") = ["abc", "", def"]
* StringUtils.splitPreserveAllTokens("ab:cd:ef", ":") = ["ab", "cd", "ef"]
* StringUtils.splitPreserveAllTokens("ab:cd:ef:", ":") = ["ab", "cd", "ef", ""]
* StringUtils.splitPreserveAllTokens("ab:cd:ef::", ":") = ["ab", "cd", "ef", "", ""]
* StringUtils.splitPreserveAllTokens("ab::cd:ef", ":") = ["ab", "", cd", "ef"]
* StringUtils.splitPreserveAllTokens(":cd:ef", ":") = ["", cd", "ef"]
* StringUtils.splitPreserveAllTokens("::cd:ef", ":") = ["", "", cd", "ef"]
* StringUtils.splitPreserveAllTokens(":cd:ef:", ":") = ["", cd", "ef", ""]
* </pre>
*
* @param str the String to parse, may be {@code null}
* @param separatorChars the characters used as the delimiters,
* {@code null} splits on whitespace
* @return an array of parsed Strings, {@code null} if null String input
* @since 2.1
*/
public static String[] splitPreserveAllTokens(final String str, final String separatorChars) {
return splitWorker(str, separatorChars, -1, true);
}
/**
* Splits the provided text into an array with a maximum length,
* separators specified, preserving all tokens, including empty tokens
* created by adjacent separators.
*
* <p>The separator is not included in the returned String array.
* Adjacent separators are treated as separators for empty tokens.
* Adjacent separators are treated as one separator.</p>
*
* <p>A {@code null} input String returns {@code null}.
* A {@code null} separatorChars splits on whitespace.</p>
*
* <p>If more than {@code max} delimited substrings are found, the last
* returned string includes all characters after the first {@code max - 1}
* returned strings (including separator characters).</p>
*
* <pre>
* StringUtils.splitPreserveAllTokens(null, *, *) = null
* StringUtils.splitPreserveAllTokens("", *, *) = []
* StringUtils.splitPreserveAllTokens("ab de fg", null, 0) = ["ab", "de", "fg"]
* StringUtils.splitPreserveAllTokens("ab de fg", null, 0) = ["ab", "", "", "de", "fg"]
* StringUtils.splitPreserveAllTokens("ab:cd:ef", ":", 0) = ["ab", "cd", "ef"]
* StringUtils.splitPreserveAllTokens("ab:cd:ef", ":", 2) = ["ab", "cd:ef"]
* StringUtils.splitPreserveAllTokens("ab de fg", null, 2) = ["ab", " de fg"]
* StringUtils.splitPreserveAllTokens("ab de fg", null, 3) = ["ab", "", " de fg"]
* StringUtils.splitPreserveAllTokens("ab de fg", null, 4) = ["ab", "", "", "de fg"]
* </pre>
*
* @param str the String to parse, may be {@code null}
* @param separatorChars the characters used as the delimiters,
* {@code null} splits on whitespace
* @param max the maximum number of elements to include in the
* array. A zero or negative value implies no limit
* @return an array of parsed Strings, {@code null} if null String input
* @since 2.1
*/
public static String[] splitPreserveAllTokens(final String str, final String separatorChars, final int max) {
return splitWorker(str, separatorChars, max, true);
}
/**
* Performs the logic for the {@code split} and
* {@code splitPreserveAllTokens} methods that do not return a
* maximum array length.
*
* @param str the String to parse, may be {@code null}
* @param separatorChar the separate character
* @param preserveAllTokens if {@code true}, adjacent separators are
* treated as empty token separators; if {@code false}, adjacent
* separators are treated as one separator.
* @return an array of parsed Strings, {@code null} if null String input
*/
private static String[] splitWorker(final String str, final char separatorChar, final boolean preserveAllTokens) {
// Performance tuned for 2.0 (JDK1.4)
if (str == null) {
return null;
}
final int len = str.length();
if (len == 0) {
return ArrayUtils.EMPTY_STRING_ARRAY;
}
final List<String> list = new ArrayList<>();
int i = 0;
int start = 0;
boolean match = false;
boolean lastMatch = false;
while (i < len) {
if (str.charAt(i) == separatorChar) {
if (match || preserveAllTokens) {
list.add(str.substring(start, i));
match = false;
lastMatch = true;
}
start = ++i;
continue;
}
lastMatch = false;
match = true;
i++;
}
if (match || preserveAllTokens && lastMatch) {
list.add(str.substring(start, i));
}
return list.toArray(ArrayUtils.EMPTY_STRING_ARRAY);
}
/**
* Performs the logic for the {@code split} and
* {@code splitPreserveAllTokens} methods that return a maximum array
* length.
*
* @param str the String to parse, may be {@code null}
* @param separatorChars the separate character
* @param max the maximum number of elements to include in the
* array. A zero or negative value implies no limit.
* @param preserveAllTokens if {@code true}, adjacent separators are
* treated as empty token separators; if {@code false}, adjacent
* separators are treated as one separator.
* @return an array of parsed Strings, {@code null} if null String input
*/
private static String[] splitWorker(final String str, final String separatorChars, final int max, final boolean preserveAllTokens) {
// Performance tuned for 2.0 (JDK1.4)
// Direct code is quicker than StringTokenizer.
// Also, StringTokenizer uses isSpace() not isWhitespace()
if (str == null) {
return null;
}
final int len = str.length();
if (len == 0) {
return ArrayUtils.EMPTY_STRING_ARRAY;
}
final List<String> list = new ArrayList<>();
int sizePlus1 = 1;
int i = 0;
int start = 0;
boolean match = false;
boolean lastMatch = false;
if (separatorChars == null) {
// Null separator means use whitespace
while (i < len) {
if (Character.isWhitespace(str.charAt(i))) {
if (match || preserveAllTokens) {
lastMatch = true;
if (sizePlus1++ == max) {
i = len;
lastMatch = false;
}
list.add(str.substring(start, i));
match = false;
}
start = ++i;
continue;
}
lastMatch = false;
match = true;
i++;
}
} else if (separatorChars.length() == 1) {
// Optimise 1 character case
final char sep = separatorChars.charAt(0);
while (i < len) {
if (str.charAt(i) == sep) {
if (match || preserveAllTokens) {
lastMatch = true;
if (sizePlus1++ == max) {
i = len;
lastMatch = false;
}
list.add(str.substring(start, i));
match = false;
}
start = ++i;
continue;
}
lastMatch = false;
match = true;
i++;
}
} else {
// standard case
while (i < len) {
if (separatorChars.indexOf(str.charAt(i)) >= 0) {
if (match || preserveAllTokens) {
lastMatch = true;
if (sizePlus1++ == max) {
i = len;
lastMatch = false;
}
list.add(str.substring(start, i));
match = false;
}
start = ++i;
continue;
}
lastMatch = false;
match = true;
i++;
}
}
if (match || preserveAllTokens && lastMatch) {
list.add(str.substring(start, i));
}
return list.toArray(ArrayUtils.EMPTY_STRING_ARRAY);
}