Google Protocol Buffers介绍和总结

最新推荐文章于 2024-06-28 15:00:46 发布

逍遥子(｡ˇε ˇ｡）

最新推荐文章于 2024-06-28 15:00:46 发布

阅读量1.2k

点赞数

Win C/C++ 同时被 3 个专栏收录

570 篇文章 2 订阅

订阅专栏

Linux/Unix C编程

70 篇文章 0 订阅

订阅专栏

Python

68 篇文章 0 订阅

订阅专栏

简要介绍和总结protobuf的一些关键点，从我之前做的ppt里摘录而成，希望能节省protobuf初学者的入门时间。这是一个简单的Demo。

Protobuf 简介
Protobuf全称Google Protocol Buffers

http://code.google.com/p/protobuf
结构化数据存储格式(xml, json)
用于通信协议、数据存储等
高效的序列化和反序列化
语言无关、平台无关、扩展性好
官方支持C++, Java, Python三种语言
.proto文件
定义和使用
消息定义文件user_def.proto

package user;
message UserInfo {
required int64 id = 1;
optional string name = 2;
repeated bytes nick_name = 3;
}
编译.proto，生成解析器代码

protoc –cpp_out . user.proto // user_def.pb.h user_def.pb.cc
protoc –java_out . user.proto // user/UserInfo.java
字段ID
optional string name = 2;

唯一性
序列化后，1~15占一个字节，16~2047占两个字节
字段类型
https://developers.google.com/protocol-buffers/docs/proto#scalar
string vs. bytes

.proto类型 c++类型 java类型说明
string std::string String 必须是UTF-8或ASCII文本
bytes std::string ByteString 任意的字节序列
编写建议
常用消息字段(尤其是repeated字段)的ID尽量分配在1~15之间。
尽可能多的（全部）使用optional字段。
命名方式
.proto文件名用underscore_speparated_names。
消息名用CamelCaseNames。
字段名用underscore_separated_names。
兼容性建议
不能修改字段的ID。
不能增删任何required字段。
https://developers.google.com/protocol-buffers/docs/proto#updating
序列化后的protobuf消息
一序列的键值对，键是消息字段的ID。
已知消息字段(.proto文件定义)按其ID顺序排列。
未知消息字段：
c++和java: 排在已知字段之后且顺序不定。
python: 不保留未知字段。
不包含未赋值的optional消息字段。
使用little-endian字节序存储。
反射
反射是protobuf的一个重要特性，涉及到的类主要有:

Message
MessageFactory
Reflection
Descriptor
FieldDescriptor
DescriptorPool
根据名称创建消息
以下是一个根据消息名（包含package name）创建protobuf消息的C++函数，需要注意的是返回的消息必须在用完后delete掉。

Message* createMessage(const string &typeName) {
Message *message = NULL;
// 查找message的descriptor
const Descriptor *descriptor = DescriptorPool::generated_pool()->FindMessageTypeByName(typeName);
if (descriptor) {
// 创建default message(prototype)
const Message *prototype = MessageFactory::generated_factory()->GetPrototype(descriptor);
if (NULL != prototype) {
// 创建一个可修改的message
message = prototype->New();
}
}
return message;
}
修改消息
根据消息的字段名称修改其值。以上面的user.UserInfo为例，下面将一个新的UserInfo消息的其id字段设为100。

int main() {
// 使用上面的函数创建一个新的UserInfo message
Message *msg = createMessage(“user.UserInfo”);
if (NULL == msg) {
// 创建失败，可能是消息名错误，也可能是编译后message解析器
// 没有链接到主程序中。
return -1;
}

// 获取message的descriptor
const Descriptor* descriptor = msg->GetDescriptor();
// 获取message的反射接口，可用于获取和修改字段的值
const Reflection* reflection = msg->GetReflection();

// 根据字段名查找message的字段descriptor
const FieldDescriptor* idField = descriptor->FindFieldByName("id");
// 将id设置为100
if (NULL != idField) {
    reflection->SetInt64(msg, idField, 100);
}

// ... 其他操作

// 最后删除message
delete msg;

return 0;

}
从字符串或流中读取消息
用createMessage创建一个空的消息后，最常见的使用场景是使用Message的ParseFromString或ParseFromIstream方法从字符串或流中读取一个序列化后的message。

Message *msg = createMessage("user.UserInfo");
if (NULL != msg) {
    if (!msg->ParseFromString("... serialized message string ... ")) {
        // 解析失败
        ...
    }
}

Protobuf优势
扩展性好
前后兼容
引入(import)已定义的消息
嵌套消息
高效 https://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking
适合处理大量小数据(单个Message不超过1M)
Protobuf劣势
没有内置的Set, Map等容器类型。
不适合处理单个Message超过1M的情景，详见Large Data Sets。
进一步阅读
.proto指南 https://developers.google.com/protocol-buffers/docs/proto
.proto规范 https://developers.google.com/protocol-buffers/docs/style
序列化编码方式 https://developers.google.com/protocol-buffers/docs/encoding
教程 https://developers.google.com/protocol-buffers/docs/tutorials
接口文档 https://developers.google.com/protocol-buffers/docs/reference/overview
Protobuf benchmarking https://code.google.com/p/thrift-protobuf-compare/wiki/Benchmarking
阅读资料
Protobuf documentation
Protobuf的使用和原理