获取字符串的长度

最新推荐文章于 2021-12-19 18:37:56 发布

bicashy

最新推荐文章于 2021-12-19 18:37:56 发布

阅读量247

点赞数

文章标签： java

本文链接：https://blog.csdn.net/bicashy/article/details/84467938

版权

一般取字符串的长度都会用到 str.length()，.length得到的是字符长度，不是字节长度，一个汉字和一个英文字符都是算一个字符的，在不同编码格式中，汉字占用的字节是不同的，比如在GB2312中汉字是2个字节，而在UTF-8编码格式中是3个字节，所以要根据不同的编码来计算。

在js脚本中

比如在GB2312中：
function getByteForGB(s)
{
     return s.replace(/[^\u0000-\u007f]/g, "\u0061\u0061").length;
} 

在UTF-8中：
function getByteForUTF(s)
{
     a=s.replace(/[\u0000-\u007f]/g, "\u0061");
     b=a.replace(/[\u0080-\u07ff]/g, "\u0061\u0061");
     c=b.replace(/[\u0800-\uffff]/g, "\u0061\u0061\u0061");
  
     return c.length;
}

对于JAVA后台处理，例如针对utf-8编码

  /**
     * UTF-8的情况下统计输入的字节数
      * @param strValue
     * @return
     */
    public static int getStrLength(String strValue)
    {
        int length = 0;
        int count = 0;

        // --------------------------oneChar
        String regex = "[\u0000-\u007f]";
        count = getCountByRegex(regex, strValue);
        length += count;
        // --------------------------twoChar
        regex = "[\u0080-\u07ff]";
        count = getCountByRegex(regex, strValue);
        length += count * 2;
        // --------------------------triChar
        // utf-8字符下，一个汉字占3个字节
          regex = "[\u0800-\uffff]";
        count = getCountByRegex(regex, strValue);
        length += count * 3;

        return length;
    }

    /**
     * 根据正则获取匹配到得字符数量
     * @param regex 正则表达式
     * @param strValue
     * @return
     */
    public static int getCountByRegex(String regex, String strValue)
    {
        int count = 0;
        Pattern p = Pattern.compile(regex);
        Matcher m = p.matcher(strValue);
        while (m.find())
        {
            count++;
        }

        return count;
    }