Java 分割字符串---按字节长度分割带有中文字符串

最新推荐文章于 2024-08-25 03:12:46 发布

lang_niu

最新推荐文章于 2024-08-25 03:12:46 发布

阅读量2.4k

点赞数 4

分类专栏： java 文章标签： java

本文链接：https://blog.csdn.net/lang_niu/article/details/124111806

版权

java 专栏收录该内容

72 篇文章 0 订阅

订阅专栏

中文在不同编码中占用的字节数是不同的，GBK编码中，一个汉字占两个字节，UTF-8编码格式中，一个汉字占3个字节。

public static List<String> chineseSplitFunction(String src,String byteType, int bytes){
       if(src == null){
           return null;
       }
       List<String> splitList = new ArrayList<String>();
       int startIndex = 0; //字符串截取起始位置
       int endIndex = bytes > src.length() ? src.length() : bytes; //字符串截取结束位置
       try {
           while(startIndex < src.length()){
               String subString = src.substring(startIndex,endIndex);               //截取的字符串的字节长度大于需要截取的长度时，说明包含中文字符
               //在GBK编码中，一个中文字符占2个字节，UTF-8编码格式，一个中文字符占3个字节。
               while (subString.getBytes(byteType).length > bytes) {
                   --endIndex;
                   subString = src.substring(startIndex,endIndex);
               }
               splitList.add(src.substring(startIndex,endIndex));
               startIndex = endIndex;
               //判断结束位置时要与字符串长度比较(src.length())，之前与字符串的bytes长度比较了，导致越界异常。
               endIndex = (startIndex + bytes) > src.length() ? src.length() : startIndex+bytes ;
           }
       } catch (UnsupportedEncodingException e) {
           e.printStackTrace();
       }
       return splitList;
   }

public static void main(String[] args) {
       String chineseString = "attr统计时间";
       List<String> splitStringList = chineseSplitFunction(chineseString,"UTF-8",60);
       for (String split:splitStringList) {
           System.out.println(split);
       }
   }