Java判断一个字符串是否有中文是利用Unicode编码来判断,因为中文的编码区间为:0x4e00--0x9fbb,不过通用区间来判断中文也不非常精确,因为有些中文的标点符号利用区间判断会得到错误的结果。而且利用区间判断中文效率也并不高,例如;str.substring(i, i + 1).matches("[\\u4e00-\\u9fbb]+"),就需要遍历整个字符串,如果字符串太长效率非常低,而且判断标点还会错误。
这里提高一个高效准确的判断方法,方法在下面的代码里:private static final boolean isChinese(char c) 。类已经编译通过,运行可以查看结果。
package com.songdan.test;
import java.util.Arrays;
public class test01 {
/**
*
* Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS : 4E00-9FBF:CJK 统一表意符号
Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS :F900-FAFF:CJK 兼容象形文字
Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A :3400-4DBF:CJK 统一表意符号扩展 A
Character.UnicodeBlock.GENERAL_PUNCTUATION :2000-206F:常用标点
Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION :3000-303F:CJK 符号和标点
Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS :FF00-FFEF:半角及全角形式
* */
public static boolean isCHinese(char c){
Character.UnicodeBlock ub = Character.UnicodeBlock.of(c);
if(ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS
||ub == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS
||ub == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A
||ub == Character.UnicodeBlock.GENERAL_PUNCTUATION // GENERAL_PUNCTUATION 判断中文的“号
||ub == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION // CJK_SYMBOLS_AND_PUNCTUATION 判断中文的。号
||ub == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS // HALFWIDTH_AND_FULLWIDTH_FORMS 判断中文的,号
)
return true;
return false;
}
public static boolean isCHinese(String str){
char[] ch = str.toCharArray();
for (char c : ch) {
if(isCHinese(c))
return true;
}
return false;
}
public static void main(String[] args) {
// TODO Auto-generated method stub
String string1 = "i am isjd df.";
String string2 = "i am isjd 篦盎.";
String string3 = "i am isjd 。";
String string4 = "i am isjd “";
System.out.println(string1 +" " +isCHinese(string1));
System.out.println(string2 +" " +isCHinese(string2));
System.out.println(string3 +" " +isCHinese(string3));
System.out.println(string4 +" " +isCHinese(string4));
}
}
结果