about charset op

在s60Webkit里看到的,之前项目里也需要这个功能(判断一个文本文件是什么编码),自己找资料写,费时还不一定正确。

/*判断这符串是否合法的utf-8编码*/

bool validateUtf8(const char* chs, int len, int& validMultiByteChars)
{
    // test if this really is utf8
    const char* b = chs;
    const char* e = b+len;
    const char* c = b;
    int seqRem=0;
    bool isUtf8 = true;
    bool wrongUtf8 = false;
    validMultiByteChars = 0;
    while (c<e)
        {
        if( seqRem == 0 ) 
            b = c;
        // test validity of a multibyte sequence
        if (seqRem>0) {
            // a byte in sequence must match 10xxxxxx
            if ((*c & 0xc0) == 0x80)
                seqRem--;
            else {
                //Some times, there is wrong UTF-8 encoding in the characters.
                //For example ( 0xd8 0x73). Then allow atleat one byte wrong encoding.
                //This has been found in the real site. Also, this means here that is a 
                //ASCII character.  ASCII can be handled in the UTF-8 also. 
                wrongUtf8 = true;
            }
            if (seqRem==0)
                validMultiByteChars++;
        }
        // single byte in ascii range 0xxxxxxx
        else if ((*c & 0x80) == 0x00)
            { } // do nothing
        // start of a two byte sequence 110xxxxx 10xxxxxx
        else if ((*c & 0xe0) == 0xc0)
            seqRem=1;
        // start of a three byte sequence 1110xxxx 10xxxxxx 10xxxxxx
        else if ((*c & 0xf0) == 0xe0)
            seqRem=2;
        // start of a four byte sequence 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
        else if ((*c & 0xf8) == 0xf0)
            seqRem=3;
        else if (c-b>3) {
            isUtf8 = false;
            break;
        }
        if ( wrongUtf8 ){
             if( seqRem == 1 ){
                 seqRem--; 
                 wrongUtf8 = false;
             }
             else {
                 isUtf8 = false;
                 break; 
             }
         } //end if ( wrongUtf8 )
        c++;
    } //end while
    return isUtf8;
}


 

 

 

aaaaaaaaaaa

 

 

 

bbbbbbbbbbb

 

 

 

ccccccccccccccc

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值