【leetcode】393. UTF-8 Validation【M】

175 篇文章 0 订阅
157 篇文章 0 订阅

A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

  1. For 1-byte character, the first bit is a 0, followed by its unicode code.
  2. For n-bytes character, the first n-bits are all one's, the n+1 bit is 0, followed by n-1 bytes with most significant 2 bits being 10.

This is how the UTF-8 encoding would work:

   Char. number range  |        UTF-8 octet sequence
      (hexadecimal)    |              (binary)
   --------------------+---------------------------------------------
   0000 0000-0000 007F | 0xxxxxxx
   0000 0080-0000 07FF | 110xxxxx 10xxxxxx
   0000 0800-0000 FFFF | 1110xxxx 10xxxxxx 10xxxxxx
   0001 0000-0010 FFFF | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Given an array of integers representing the data, return whether it is a valid utf-8 encoding.

Note:
The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.

Example 1:

data = [197, 130, 1], which represents the octet sequence: 11000101 10000010 00000001.

Return true.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.

Example 2:

data = [235, 140, 4], which represented the octet sequence: 11101011 10001100 00000100.

Return false.
The first 3 bits are all one's and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that's correct.
But the second continuation byte does not start with 10, so it is invalid.

Subscribe to see which companies asked this question


class Solution(object):
    #数 最开始有多少个零
    def countNumOfOne(self,s):
        res = 0
        for i in s:
            if i != '1':
                break
            res += 1
        return res

    #把字符串补全八位
    def resize(self,s):
        s = '0' * (8-len(s)) + s
        return s

    def validUtf8(self, data):
        j = 1

        i = self.resize(bin(data[0])[2:])

        l = self.countNumOfOne(i) - 1
        if len(data) == 1:
            return l == -1
        if l == -1:
            l = 0
        # print l
        # print i,' ',l

        while j < len(data):
            i = self.resize(bin(data[j])[2:])
            temp_len = self.countNumOfOne(i)
            # print i,temp_len,l

            #l>1,说明后面还应该有,所以如果当前字节不是一位的,就错了
            if l >= 1:
                if temp_len != 1:
                    return False
                l -= 1
            #如果l==0,说明之前的多位的已经结束了,如果当前还是一个1开头,那就错了
            elif l == 0:
                if temp_len == 1:
                    return False
                l = temp_len - 1
                if l == -1:
                    l = 0
            else:
                if temp_len != 0:
                    return False
                else:
                    l = 0




  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值