文件数据如下:
1. csv(85kb)
2.csv(186.5M)
测试结果:
按如下字符切分 | 1.csv | 2.csv | 1.csv | 2.csv | |
---|---|---|---|---|---|
, | java.lang.String.split | 22 | 1763 | 15 | 1596 |
org.apache.commons.lang3.StringUtils.split | 43 | 1340 | 38 | 1402 | |
0. | java.lang.String.split | 18 | 2542 | 17 | 2449 |
org.apache.commons.lang3.StringUtils.split | 46 | 1749 | 72 | 2280 | |
0.0 | java.lang.String.split | 28 | 2094 | 25 | 2020 |
org.apache.commons.lang3.StringUtils.split | 54 | 2583 | 67 | 2140 | |
0.01721102 | java.lang.String.split | 33 | 1661 | 46 | 1697 |
org.apache.commons.lang3.StringUtils.split | 54 | 3554 | 54 | 2897 | |
20个以上多个字符 | java.lang.String.split | 16 | 1677 | 30 | 2352 |
org.apache.commons.lang3.StringUtils.split | 64 | 2252 | 71 | 2009 |
结论:
- 如果切分字段比较短,且被切分字符串比较短的时候,优先采用:java.lang.String.split
- 如果切分字段比较短,但是被切分的字符串比较长的时候,优先采用org.apache.commons.lang3.StringUtils.split
- 如果切分字段比较长,且被切分字符串比较短的时候,优先采用java.lang.String.split
- 如果切分字段比较长,且悲切分的字符串比较长的时候,两者差距不是很大,但是有推荐使用:java.lang.String.split
代码分析:
java.lang.String.split
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
// 正则校验
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, value.length));
// Construct result
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
resultSize--;
}
}
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}
org.apache.commons.lang3.StringUtils.split
private static String[] splitWorker(final String str, final String separatorChars, final int max, final boolean preserveAllTokens) {
// Performance tuned for 2.0 (JDK1.4)
// Direct code is quicker than StringTokenizer.
// Also, StringTokenizer uses isSpace() not isWhitespace()
if (str == null) {
return null;
}
final int len = str.length();
if (len == 0) {
return ArrayUtils.EMPTY_STRING_ARRAY;
}
final List<String> list = new ArrayList<>();
int sizePlus1 = 1;
int i = 0, start = 0;
boolean match = false;
boolean lastMatch = false;
if (separatorChars == null) {
// Null separator means use whitespace
while (i < len) {
if (Character.isWhitespace(str.charAt(i))) {
if (match || preserveAllTokens) {
lastMatch = true;
if (sizePlus1++ == max) {
i = len;
lastMatch = false;
}
list.add(str.substring(start, i));
match = false;
}
start = ++i;
continue;
}
lastMatch = false;
match = true;
i++;
}
} else if (separatorChars.length() == 1) {
// Optimise 1 character case
final char sep = separatorChars.charAt(0);
// 通过循环i
while (i < len) {
if (str.charAt(i) == sep) {
if (match || preserveAllTokens) {
lastMatch = true;
if (sizePlus1++ == max) {
i = len;
lastMatch = false;
}
list.add(str.substring(start, i));
match = false;
}
start = ++i;
continue;
}
lastMatch = false;
match = true;
i++;
}
} else {
// standard case
while (i < len) {
if (separatorChars.indexOf(str.charAt(i)) >= 0) {
if (match || preserveAllTokens) {
lastMatch = true;
if (sizePlus1++ == max) {
i = len;
lastMatch = false;
}
list.add(str.substring(start, i));
match = false;
}
start = ++i;
continue;
}
lastMatch = false;
match = true;
i++;
}
}
if (match || preserveAllTokens && lastMatch) {
list.add(str.substring(start, i));
}
return list.toArray(ArrayUtils.EMPTY_STRING_ARRAY);
}
源码阅读结论:
- java.lang.String.split:有正则校验,但是如果你只是使用字符进行切分而非正则切分,其实是用不到的,所以有耗时
- org.apache.commons.lang3.StringUtils.split:虽然没有正则相关判断,但是采用的是i++进行迭代循环,导致性能不高,没有采用类似于java.lang.String.split中直接获取该字符下标的方式进行截取。
代码:
{
public static void main(String[] args) {
String path1 = "/Users/Desktop/csdn/1.csv";
String path2 = "/Users/Desktop/csdn/2.csv";
String splitStr = ",";
String path = path1;
Long split_start_0 = System.currentTimeMillis();
getFileInfo(path, 3, splitStr);
Long split_end_0 = System.currentTimeMillis();
System.out.println("java.lang.String:" + path + ":" + +(split_end_0 - split_start_0));
}
public static StringBuilder getFileInfo(String path, Integer choose, String splitStr) {
StringBuilder sb = new StringBuilder();
File file = new File(path);
if (!file.exists()) {
System.out.printf("当前文件不存在!!");
return sb;
}
FileReader fileReader = null;
BufferedReader bufferedReader = null;
try {
fileReader = new FileReader(file);
bufferedReader = new BufferedReader(fileReader);
String line;
while ((line = bufferedReader.readLine()) != null) {
sb.append(line);
String[] strArrays;
if (1 == choose) {
strArrays = line.split(splitStr);
} else if (2 == choose) {
strArrays = StringUtils.split(line, splitStr);
} else {
strArrays = org.springframework.util.StringUtils.split(line, splitStr);
}
for (int i = 0; i < strArrays.length; i++) {
System.out.print(strArrays[i]);
System.out.print(" ");
}
System.out.println();
}
} catch (Exception e) {
System.out.printf("发生异常");
}
sb = null;
// sb = new StringBuilder("");
return sb;
}
}
终:
在成长阶段,以上全部为个人理解,如有不正确的地方还望指出,谢谢。