Mark一下,使用这种方法前最好看下源码,没注意就出了个bug,我是解析一个文件,然后读line,按\t分隔,然后最后三位是空的""字符串,
split之后用 String[] 数组接之后就吞掉了后三位,就数组越界了,因为我的用法是line.split("\t") ,走的方法是第一个默认limit 是零。例如String str= "a,b,c,,,"; String[] strs = str.split(","); strs里面就只剩三个了,后面三个空都被程序吞掉了。数组长度不是预期的。
官方文档解释为:
limit为分割次数限制。limit >0 str分隔limit-1次,数组的长度将不大于n
limit<0 str尽可以能多的被分隔
limit = 0 就是默认的情况没有limit参数时,尾部的字符串就被舍弃了
解决办法:
String[] strs = str.split(",", -1); 走的是小于零的,尽可能多的被分隔。空字符串会被保留到数组中。
源码:
public String[] split(String regex) {
return split(regex, 0);
}
public String[] split(String regex, int limit) {
/* fastpath if the regex is a
(1)one-char String and this character is not one of the
RegEx's meta characters ".$|()[{^?*+\\", or
(2)two-char String and the first char is the backslash and
the second is not the ascii digit or ascii letter.
*/
char ch = 0;
if (((regex.value.length == 1 &&
".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
(regex.length() == 2 &&
regex.charAt(0) == '\\' &&
(((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
((ch-'a')|('z'-ch)) < 0 &&
((ch-'A')|('Z'-ch)) < 0)) &&
(ch < Character.MIN_HIGH_SURROGATE ||
ch > Character.MAX_LOW_SURROGATE))
{
int off = 0;
int next = 0;
boolean limited = limit > 0;
ArrayList<String> list = new ArrayList<>();
while ((next = indexOf(ch, off)) != -1) {
if (!limited || list.size() < limit - 1) {
list.add(substring(off, next));
off = next + 1;
} else { // last one
//assert (list.size() == limit - 1);
list.add(substring(off, value.length));
off = value.length;
break;
}
}
// If no match was found, return this
if (off == 0)
return new String[]{this};
// Add remaining segment
if (!limited || list.size() < limit)
list.add(substring(off, value.length));
// Construct result
int resultSize = list.size();
if (limit == 0) {
while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
resultSize--;
}
}
String[] result = new String[resultSize];
return list.subList(0, resultSize).toArray(result);
}
return Pattern.compile(regex).split(this, limit);
}