java 切割字符串split用法

最新推荐文章于 2024-08-21 02:26:18 发布

code_farmer_ahua

最新推荐文章于 2024-08-21 02:26:18 发布

阅读量6.5k

点赞数 1

分类专栏： java

java 专栏收录该内容

6 篇文章 0 订阅

订阅专栏

System.out.println(":ab:cd:ef::".split(":").length);//末尾分隔符全部忽略
System.out.println(":ab:cd:ef::".split(":",-1).length);//不忽略任何一个分隔符
System.out.println(StringUtils.split(":ab:cd:ef::",":").length);//最前面的和末尾的分隔符全部都忽略,apache commons
System.out.println(StringUtils.splitPreserveAllTokens(":ab:cd:ef::",":").length);//不忽略任何一个分隔符 apache commons
输出：
4
6
3

看了下jdk里String类的public String[] split(String regex,int limit)方法，感觉平时不太会用这方法，以为在用正则表达式来拆分时候，如果匹配到的字符是最后一个字符时，会拆分出两个空字符串，例如"o"split("o",5) or "o"split("o",-2)时候结果是"" "" 也就是下图中红框里的内容，所以平时一般都用split(String regex) 方法，其实也就等同于split(String regex，0)方法，把结尾的空字符串丢弃！

String的split方法用到的参数是一个正则式，虽然强大，但是有时候容易出错。而且string并没有提供简化版本。org.apache.commons.lang.StringUtils提供的split改变了这一状况，开始使用完整的字符串作为参数，而不是regex。同时，对类似功能的jdk版本的StringTokenizer，在内部方法splitWorker中有段注释：Direct code is quicker than StringTokenizer.也就是说，这个是更快的一个工具了~~

StringUtils里的split和splitPreserveAllTokens 底层都是调用splitWorker方法实现的
下面分别来理解下两个私有的splitWorker方法：

    Java代码   
    
  
 private static String[] splitWorker(String str, char separatorChar, boolean preserveAllTokens)  
 {  
         // Performance tuned for 2.0 (JDK1.4)  
   
         if (str == null) {  
             return null;  
         }  
         int len = str.length();  
         if (len == 0) {  
             return ArrayUtils.EMPTY_STRING_ARRAY;  
         }  
         List list = new ArrayList();  
         int i = 0, start = 0;  
         boolean match = false;  
         boolean lastMatch = false;  
         while (i < len) {  
             if (str.charAt(i) == separatorChar) {  
                 if (match || preserveAllTokens) {  
                     list.add(str.substring(start, i));  
                     match = false;  
                     lastMatch = true;  
                 }  
                 start = ++i;  
                 continue;  
             }  
             lastMatch = false;  
             match = true;  
             i++;  
         }  
         if (match || (preserveAllTokens && lastMatch)) {  
             list.add(str.substring(start, i));  
         }  
         return (String[]) list.toArray(new String[list.size()]);  
     }  

是一个核心方法，用于拆分字符串，其中字符c表示分隔符，另外布尔变量b表示c在首尾的不同处理方式。为真，则在首位留一个""的字符串。但是在中间是没有作用的。该方法执行如下操作：
如果字符串为null，则返回null。
如果字符串为""，则返回""。
用i作为指针遍历字符串，match和lastMatch分别表示遇到和最后遇到可分割的内容。
如果字符串中第一个就遇到c，则看b的值，如果为真，则会在结果数组中存入一个""。如果没遇到，match置真，lastMatch置假，表示有要分割的内容。
一旦遇到c，则在结果数组中输出字符串在i之前的子字符串，并把起始点调整到i之后。且match置假，lastMatch置真。
遍历结束，如果match为真（到最后也没有遇到c），或者lastMatch和b同为真（最后一个字符是c），则输出最后的部分（如果是后者，则会输出一个""）。

    Java代码   
    
  
 private static String[] splitWorker(String str, String separatorChars, int max, boolean preserveAllTokens)  
 {  
         // Performance tuned for 2.0 (JDK1.4)  
         // Direct code is quicker than StringTokenizer.  
         // Also, StringTokenizer uses isSpace() not isWhitespace()  
   
         if (str == null) {  
             return null;  
         }  
         int len = str.length();  
         if (len == 0) {  
             return ArrayUtils.EMPTY_STRING_ARRAY;  
         }  
         List list = new ArrayList();  
         int sizePlus1 = 1;  
         int i = 0, start = 0;  
         boolean match = false;  
         boolean lastMatch = false;  
         if (separatorChars == null) {  
             // Null separator means use whitespace  
             while (i < len) {  
                 if (Character.isWhitespace(str.charAt(i))) {  
                     if (match || preserveAllTokens) {  
                         lastMatch = true;  
                         if (sizePlus1++ == max) {  
                             i = len;  
                             lastMatch = false;  
                         }  
                         list.add(str.substring(start, i));  
                         match = false;  
                     }  
                     start = ++i;  
                     continue;  
                 }  
                 lastMatch = false;  
                 match = true;  
                 i++;  
             }  
         } else if (separatorChars.length() == 1) {  
             // Optimise 1 character case  
             char sep = separatorChars.charAt(0);  
             while (i < len) {  
                 if (str.charAt(i) == sep) {  
                     if (match || preserveAllTokens) {  
                         lastMatch = true;  
                         if (sizePlus1++ == max) {  
                             i = len;  
                             lastMatch = false;  
                         }  
                         list.add(str.substring(start, i));  
                         match = false;  
                     }  
                     start = ++i;  
                     continue;  
                 }  
                 lastMatch = false;  
                 match = true;  
                 i++;  
             }  
         } else {  
             // standard case  
             while (i < len) {  
                 if (separatorChars.indexOf(str.charAt(i)) >= 0) {  
                     if (match || preserveAllTokens) {  
                         lastMatch = true;  
                         if (sizePlus1++ == max) {  
                             i = len;  
                             lastMatch = false;  
                         }  
                         list.add(str.substring(start, i));  
                         match = false;  
                     }  
                     start = ++i;  
                     continue;  
                 }  
                 lastMatch = false;  
                 match = true;  
                 i++;  
             }  
         }  
         if (match || (preserveAllTokens && lastMatch)) {  
             list.add(str.substring(start, i));  
         }  
         return (String[]) list.toArray(new String[list.size()]);  
     }  

也是一个核心方法，用于拆分字符串，其与上一个方法的不同之处在于其分隔符用字符串表示一组字符，且增加一个max变量，表示输出的字符串数组的最大长度。另外注意该方法的b如果为真，会在首尾及中间起作用，且如果分隔符字符串长度大于1，则数组中的""会更多（根据分隔符字符的数量）。该方法执行如下操作：
如果字符串为null，则返回null。
如果字符串为""，则返回""。
之后的处理分三种情况，分别是分隔符字符串为null，则默认为" "；分割符字符串长度为1；分割符字符串为普通字符串。这三种处理的不同只是在当前遍历中的字符的判断问题。
    1.利用Character.isWhitespace方法判断每个字符是否为" "。
    2.先把字符串转化为一个char，然后就和前一个splitWorker方法类似。
    3.利用indexOf方法查找当前字符是否在分隔符字符串中，然后就和前一个splitWorker方法类似。
    需要注意的是，如果输出的数组的数量已经等于max的值，则把指针直接挪到最后，等待下次遍历的时候直接跳出。同时由于lastMatch和match都置为假，最后也不会输出""了。
   遍历结束，如果match为真（到最后也没有遇到c），或者lastMatch和b同为真（最后一个字符在分隔符字符串中），则输出最后的部分（如果是后者，则会输出一个""）。