首先先说一下这篇记录的原因:要将几个字符串拼接成一个字符串,每个字符串定长,且长度不等,使用“|”隔开,超出长度的在下一行展示,但是要跟上一行的序列位置对接上,如:
这种情况有的可能包含汉字、英文、数字、字符等等,所以string.getbytes(“gbk”);统一一下各个字符所占用的字节,如第三列,最后一个字节放不下一个汉字,则使用空格代替。废话不说,上码:
List<String> outData = new ArrayList<String>();
String data = "AFDASFA212132啊发发达二等分舒服舒服发得分为-发(发多少)";
int size = 45;
byte[] bytes ;
try {
bytes = data.getBytes("gbk");
int num = 0;
int count = 1;
while(bytes.length - num > size){
int zhi = size * count;
if(bytes.length > zhi){
//如果不是中文直接截取
if(bytes[zhi-1] >= 0 && bytes[zhi-1] <= 127){
String splitString = new String(bytes,num,size,"gbk");
outData.add(splitString);
num += size;
} else {
String splitString = new String(bytes,num,size -1,"gbk");
//如果最后一位是整个中文则直接拼接空格
if(splitString.substring(splitString.length()-1, splitString.length())
.matches("[\u4e00-\u9fa5]")){
outData.add(splitString + "");
num = num + (size -1);
} else {
byte[] bytes1 = splitString.getBytes("gbk");
if(bytes1.length<45){
String newSplitString = new String(bytes,num,size,"gbk");
if(newSplitString.substring(newSplitString.length()-1, newSplitString.length())
.matches("[\u4e00-\u9fa5]")){
outData.add(newSplitString);
num += size;
}else{
String newLastSplitString =new String(bytes,num,size-2,"gbk");
outData.add(newLastSplitString + " ");
num = num + (size -2);
}
}
}
}
}else{
if(bytes[bytes.length-1] >= 0 && bytes[bytes.length-1] <= 127){
String splitString = new String(bytes,num,size,"gbk");
outData.add(splitString);
num += size;
}else{
String splitString = new String(bytes,num,size -1,"gbk");
outData.add(splitString + "");
num += (size-1);
}
}
count++;
}
if(bytes.length - num <= size){
String splitString = new String(bytes,num,bytes.length - num ,"gbk");
outData.add(splitString);
}
System.out.println(outData);
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
汉字占用两个字节,英文、数字等占用一个字节,截取后在判断一下最后一个字节是否为汉字,如果是汉字判断是否是整个汉字,避免将一个汉字截取一半的情况,会乱码。