google probuf解析原理分析

上一篇转载的文章分析了PB的编码方式,解析的很详细。

下面来分析下PB解码的细节:

来看这个基本函数:

bool MessageLite::ParseFromArray(const void* data, int size) {
  return InlineParseFromArray(data, size, this);
}
里面会继续调用:

bool InlineParseFromArray(const void* data, int size, MessageLite* message) {
  io::CodedInputStream input(reinterpret_cast<const uint8*>(data), size);
  return InlineParseFromCodedStream(&input, message) &&
         input.ConsumedEntireMessage();
}

数据和长度 转换为CodedInputStream,我们看下CodedInputStream的构造函数:

inline CodedInputStream::CodedInputStream(const uint8* buffer, int size)
  : input_(NULL),
    buffer_(buffer),
    buffer_end_(buffer + size),
    total_bytes_read_(size),
    overflow_bytes_(0),
    last_tag_(0),
    legitimate_message_end_(false),
    aliasing_enabled_(false),
    current_limit_(size),
    buffer_size_after_limit_(0),
    total_bytes_limit_(kDefaultTotalBytesLimit),
    total_bytes_warning_threshold_(kDefaultTotalBytesWarningThreshold),
    recursion_depth_(0),
    recursion_limit_(default_recursion_limit_),
    extension_pool_(NULL),
    extension_factory_(NULL) {
  // Note that setting current_limit_ == size is important to prevent some
  // code paths from trying to access input_ and segfaulting.
}

里面比较重要的字段是buffer_,buffer_end_,total_bytes_read_,legitimate_message_end_;解析中会记住解析到哪个字节了,没有解析完就一直解析,如果一个流里包含了多个消息,相应的字段都会被后面的覆盖。

之后会解析CodeStream的东东:

bool InlineParseFromCodedStream(io::CodedInputStream* input,
                                MessageLite* message) {
  message->Clear();
  return InlineMergeFromCodedStream(input, message);
}

先把message清理,clear是个虚函数,由具体的消息实现:

然后跑到

bool InlineMergeFromCodedStream(io::CodedInputStream* input,
                                MessageLite* message) {
  if (!message->MergePartialFromCodedStream(input)) return false;
  if (!message->IsInitialized()) {
    GOOGLE_LOG(ERROR) << InitializationErrorMessage("parse", *message);
    return false;
  }
  return true;
}

这里的MergePartialFromCodedStream也是虚函数。

来看一个实际的:

bool Person_PhoneNumber::MergePartialFromCodedStream(
    ::google::protobuf::io::CodedInputStream* input) {
#define DO_(EXPRESSION) if (!(EXPRESSION)) return false
  ::google::protobuf::uint32 tag;
  while ((tag = input->ReadTag()) != 0) {
    switch (::google::protobuf::internal::WireFormatLite::GetTagFieldNumber(tag)) {
      // required string number = 1;
      case 1: {
        if (::google::protobuf::internal::WireFormatLite::GetTagWireType(tag) ==
            ::google::protobuf::internal::WireFormatLite::WIRETYPE_LENGTH_DELIMITED) {
          DO_(::google::protobuf::internal::WireFormatLite::ReadString(
                input, this->mutable_number()));
          ::google::protobuf::internal::WireFormat::VerifyUTF8String(
            this->number().data(), this->number().length(),
            ::google::protobuf::internal::WireFormat::PARSE);
        } else {
          goto handle_uninterpreted;
        }
        if (input->ExpectTag(16)) goto parse_type;
        break;
      }

      // optional .Person.PhoneType type = 2 [default = HOME];
      case 2: {
        if (::google::protobuf::internal::WireFormatLite::GetTagWireType(tag) ==
            ::google::protobuf::internal::WireFormatLite::WIRETYPE_VARINT) {
         parse_type:
          int value;
          DO_((::google::protobuf::internal::WireFormatLite::ReadPrimitive<
                   int, ::google::protobuf::internal::WireFormatLite::TYPE_ENUM>(
                 input, &value)));
          if (::Person_PhoneType_IsValid(value)) {
            set_type(static_cast< ::Person_PhoneType >(value));
          } else {
            mutable_unknown_fields()->AddVarint(2, value);
          }
        } else {
          goto handle_uninterpreted;
        }
        if (input->ExpectAtEnd()) return true;
        break;
      }

      default: {
      handle_uninterpreted:
        if (::google::protobuf::internal::WireFormatLite::GetTagWireType(tag) ==
            ::google::protobuf::internal::WireFormatLite::WIRETYPE_END_GROUP) {
          return true;
        }
        DO_(::google::protobuf::internal::WireFormat::SkipField(
              input, tag, mutable_unknown_fields()));
        break;
      }
    }
  }
  return true;
#undef DO_
}

如果理解了编码方式,解码就是倒过来。

wiretype为

TypeMeaningUsed For
0 Varint int32, int64, uint32, uint64, sint32, sint64, bool, enum
1 64-bit fixed64, sfixed64, double
2 Length-delimi string, bytes, embedded messages, packed repeated fields
3 Start group Groups (deprecated)
4 End group Groups (deprecated)
5 32-bit fixed32, sfixed32, float
比如我这里写的测试代码:

	Person_PhoneNumber test1;
	test1.set_number("11");

那么第一个字节编码是0000 1010,右偏移3位GetTagFieldNumber=1;这个FieldNum就是我们的proto文件自己设置的字段编码;

然后计算wiretype

static_cast<WireType>(tag & kTagTypeMask);

kTagTypeMask为111

wiretype就等于

就是010 == WIRETYPE_LENGTH_DELIMITED,因为string可可变长度的,所以是len+value格式

bool WireFormatLite::ReadString(io::CodedInputStream* input,
                                string* value) {
  // String is for UTF-8 text only
  uint32 length;
  if (!input->ReadVarint32(&length)) return false;
  if (!input->InternalReadStringInline(value, length)) return false;
  return true;
}

上面函数就是这个流程。

这里面有个std::string的小细节:

inline char* string_as_array(string* str) {
  // DO NOT USE const_cast<char*>(str->data())! See the unittest for why.
  return str->empty() ? NULL : &*str->begin();
}

注释里写明了,不能用str->data(),现在我还不怎 清楚.


  • 2
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值