上一篇转载的文章分析了PB的编码方式,解析的很详细。
下面来分析下PB解码的细节:
来看这个基本函数:
bool MessageLite::ParseFromArray(const void* data, int size) {
return InlineParseFromArray(data, size, this);
}
里面会继续调用:
bool InlineParseFromArray(const void* data, int size, MessageLite* message) {
io::CodedInputStream input(reinterpret_cast<const uint8*>(data), size);
return InlineParseFromCodedStream(&input, message) &&
input.ConsumedEntireMessage();
}
数据和长度 转换为CodedInputStream,我们看下CodedInputStream的构造函数:
inline CodedInputStream::CodedInputStream(const uint8* buffer, int size)
: input_(NULL),
buffer_(buffer),
buffer_end_(buffer + size),
total_bytes_read_(size),
overflow_bytes_(0),
last_tag_(0),
legitimate_message_end_(false),
aliasing_enabled_(false),
current_limit_(size),
buffer_size_after_limit_(0),
total_bytes_limit_(kDefaultTotalBytesLimit),
total_bytes_warning_threshold_(kDefaultTotalBytesWarningThreshold),
recursion_depth_(0),
recursion_limit_(default_recursion_limit_),
extension_pool_(NULL),
extension_factory_(NULL) {
// Note that setting current_limit_ == size is important to prevent some
// code paths from trying to access input_ and segfaulting.
}
里面比较重要的字段是buffer_,buffer_end_,total_bytes_read_,legitimate_message_end_;解析中会记住解析到哪个字节了,没有解析完就一直解析,如果一个流里包含了多个消息,相应的字段都会被后面的覆盖。
之后会解析CodeStream的东东:
bool InlineParseFromCodedStream(io::CodedInputStream* input,
MessageLite* message) {
message->Clear();
return InlineMergeFromCodedStream(input, message);
}
先把message清理,clear是个虚函数,由具体的消息实现:
然后跑到
bool InlineMergeFromCodedStream(io::CodedInputStream* input,
MessageLite* message) {
if (!message->MergePartialFromCodedStream(input)) return false;
if (!message->IsInitialized()) {
GOOGLE_LOG(ERROR) << InitializationErrorMessage("parse", *message);
return false;
}
return true;
}
这里的MergePartialFromCodedStream也是虚函数。
来看一个实际的:
bool Person_PhoneNumber::MergePartialFromCodedStream(
::google::protobuf::io::CodedInputStream* input) {
#define DO_(EXPRESSION) if (!(EXPRESSION)) return false
::google::protobuf::uint32 tag;
while ((tag = input->ReadTag()) != 0) {
switch (::google::protobuf::internal::WireFormatLite::GetTagFieldNumber(tag)) {
// required string number = 1;
case 1: {
if (::google::protobuf::internal::WireFormatLite::GetTagWireType(tag) ==
::google::protobuf::internal::WireFormatLite::WIRETYPE_LENGTH_DELIMITED) {
DO_(::google::protobuf::internal::WireFormatLite::ReadString(
input, this->mutable_number()));
::google::protobuf::internal::WireFormat::VerifyUTF8String(
this->number().data(), this->number().length(),
::google::protobuf::internal::WireFormat::PARSE);
} else {
goto handle_uninterpreted;
}
if (input->ExpectTag(16)) goto parse_type;
break;
}
// optional .Person.PhoneType type = 2 [default = HOME];
case 2: {
if (::google::protobuf::internal::WireFormatLite::GetTagWireType(tag) ==
::google::protobuf::internal::WireFormatLite::WIRETYPE_VARINT) {
parse_type:
int value;
DO_((::google::protobuf::internal::WireFormatLite::ReadPrimitive<
int, ::google::protobuf::internal::WireFormatLite::TYPE_ENUM>(
input, &value)));
if (::Person_PhoneType_IsValid(value)) {
set_type(static_cast< ::Person_PhoneType >(value));
} else {
mutable_unknown_fields()->AddVarint(2, value);
}
} else {
goto handle_uninterpreted;
}
if (input->ExpectAtEnd()) return true;
break;
}
default: {
handle_uninterpreted:
if (::google::protobuf::internal::WireFormatLite::GetTagWireType(tag) ==
::google::protobuf::internal::WireFormatLite::WIRETYPE_END_GROUP) {
return true;
}
DO_(::google::protobuf::internal::WireFormat::SkipField(
input, tag, mutable_unknown_fields()));
break;
}
}
}
return true;
#undef DO_
}
如果理解了编码方式,解码就是倒过来。
wiretype为
Type | Meaning | Used For |
---|---|---|
0 | Varint | int32, int64, uint32, uint64, sint32, sint64, bool, enum |
1 | 64-bit | fixed64, sfixed64, double |
2 | Length-delimi | string, bytes, embedded messages, packed repeated fields |
3 | Start group | Groups (deprecated) |
4 | End group | Groups (deprecated) |
5 | 32-bit | fixed32, sfixed32, float |
Person_PhoneNumber test1;
test1.set_number("11");
那么第一个字节编码是0000 1010,右偏移3位GetTagFieldNumber=1;这个FieldNum就是我们的proto文件自己设置的字段编码;
然后计算wiretype
static_cast<WireType>(tag & kTagTypeMask);
kTagTypeMask为111
wiretype就等于
就是010 == WIRETYPE_LENGTH_DELIMITED,因为string可可变长度的,所以是len+value格式
bool WireFormatLite::ReadString(io::CodedInputStream* input,
string* value) {
// String is for UTF-8 text only
uint32 length;
if (!input->ReadVarint32(&length)) return false;
if (!input->InternalReadStringInline(value, length)) return false;
return true;
}
上面函数就是这个流程。
这里面有个std::string的小细节:
inline char* string_as_array(string* str) {
// DO NOT USE const_cast<char*>(str->data())! See the unittest for why.
return str->empty() ? NULL : &*str->begin();
}
注释里写明了,不能用str->data(),现在我还不怎 清楚.