java 使用split后数组丢失结尾空字符串

最新推荐文章于 2021-06-24 10:43:11 发布

ccmedu

最新推荐文章于 2021-06-24 10:43:11 发布

阅读量1.1k

点赞数 1

分类专栏： java 杂记

本文链接：https://blog.csdn.net/ccmedu/article/details/104835776

版权

java 同时被 2 个专栏收录

80 篇文章 0 订阅

订阅专栏

杂记

16 篇文章 0 订阅

订阅专栏

Mark一下，使用这种方法前最好看下源码，没注意就出了个bug,我是解析一个文件，然后读line,按\t分隔，然后最后三位是空的""字符串，

split之后用 String[] 数组接之后就吞掉了后三位，就数组越界了，因为我的用法是line.split("\t") ,走的方法是第一个默认limit 是零。例如String str= "a,b,c,,,"; String[] strs = str.split(","); strs里面就只剩三个了，后面三个空都被程序吞掉了。数组长度不是预期的。

官方文档解释为：

limit为分割次数限制。limit >0 str分隔limit-1次,数组的长度将不大于n

limit<0 str尽可以能多的被分隔

limit = 0 就是默认的情况没有limit参数时，尾部的字符串就被舍弃了

解决办法：

String[] strs = str.split(",", -1); 走的是小于零的，尽可能多的被分隔。空字符串会被保留到数组中。

源码：
public String[] split(String regex) {
    return split(regex, 0);
}

public String[] split(String regex, int limit) {
        /* fastpath if the regex is a
         (1)one-char String and this character is not one of the
            RegEx's meta characters ".$|()[{^?*+\\", or
         (2)two-char String and the first char is the backslash and
            the second is not the ascii digit or ascii letter.
         */
        char ch = 0;
        if (((regex.value.length == 1 &&
             ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||
             (regex.length() == 2 &&
              regex.charAt(0) == '\\' &&
              (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&
              ((ch-'a')|('z'-ch)) < 0 &&
              ((ch-'A')|('Z'-ch)) < 0)) &&
            (ch < Character.MIN_HIGH_SURROGATE ||
             ch > Character.MAX_LOW_SURROGATE))
        {
            int off = 0;
            int next = 0;
            boolean limited = limit > 0;
            ArrayList<String> list = new ArrayList<>();
            while ((next = indexOf(ch, off)) != -1) {
                if (!limited || list.size() < limit - 1) {
                    list.add(substring(off, next));
                    off = next + 1;
                } else {    // last one
                    //assert (list.size() == limit - 1);
                    list.add(substring(off, value.length));
                    off = value.length;
                    break;
                }
            }
            // If no match was found, return this
            if (off == 0)
                return new String[]{this};

            // Add remaining segment
            if (!limited || list.size() < limit)
                list.add(substring(off, value.length));

            // Construct result
            int resultSize = list.size();
            if (limit == 0) {
                while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {
                    resultSize--;
                }
            }
            String[] result = new String[resultSize];
            return list.subList(0, resultSize).toArray(result);
        }
        return Pattern.compile(regex).split(this, limit);
    }

ccmedu

关注

1
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
java 使用split后数组丢失结尾空字符串

使用这种方法前最好看下源码，没注意就出了个bug,我是解析一个文件，然后读line,按\t分隔，然后最后三位是空的""字符串，split之后用 String[] 数组接之后就吞掉了后三位，就数组越界了，因为我的用法是line.split("\t"),走的方法是第一个默认limit 是零。例如String str= "a,b,c,,,"; String[] strs = str.split("...
复制链接

扫一扫