一.什么是PB
PB是Google开发的一种开源数据交换方式。特别适合于在RPC间交换对象及数据结构。与其相似的应用有XML、JSON、THRIFT等。
二.为什么要用PB
相对于其主要同类应用XML,PB的主要优势在于更小的数据size(比XML小3-10倍)和更快的解析速度(比XML快20-100倍),同时在使用上也更简单。PB的劣势主要是可读性比较差,由于其生成的是二进制数据,可读性要远低于XML的明文格式,同时编辑也要借助代码来完成(XML可以直接编辑)。
三.安装
1. 下载PB
至https://code.google.com/p/protobuf/downloads/list下载最新版PB。建议最好是下载源代码(也提供了Binary下载),然后自己来编译。以下主要以windows下的编译做示例,Linux下的可以参考自行完成。(Windows下的下载包protobuf-2.5.0.zip)
2. 编译
在vsprojects文件夹下可以找到解决方案protobuf.sln(该方案是用VS2008生成的,低版本的VS可以使用目录下提供的Linux脚本convert2008to2005.sh降级),用VS打开后直接启动编译即可。编译完成后会生成一个主要的exe和三个.lib库,分别是protoc.exe, libprotobuf.lib, libprotobuf-lite.lib, libprotoc.lib. 其中protoc.exe是用于编译.proto文件的,其他三个库是用于编译序列化/反序列化代码时使用的。(稍后会讲到)
四.使用
1. 构建对象描述文件(.proto文件)
要使用PB,首先你要构建一个对象描述文件。见下例:
person.proto
上面是一个简单的对象描述文件。有点类似于伪代码。这里解释一下字段的属性、类型、名称及ID。字段的属性分为三种: required, optional, repeated. 分别表示该字段是必须的,可选的及重复的。具体含义如下:Each field mustbe annotated with one of the following modifiers:
· required: a value forthe field must be provided, otherwise the message will be considered"uninitialized". If libprotobuf is compiled in debug mode, serializing an uninitialized message will causean assertion failure. In optimized builds, the check is skipped and the messagewill be written anyway. However, parsing an uninitialized message will alwaysfail (by returning false from the parse method). Other than this, a required field behaves exactlylike an optional field.
· optional: the fieldmay or may not be set. If an optional field value isn't set, a default value isused. For simple types, you can specify your own default value, as we've donefor the phone number type in the example. Otherwise, a system default is used: zero for numerictypes, the empty string for strings, false for bools. For embedded messages,the default value is always the "default instance" or"prototype" of the message, which has none of its fields set. Callingthe accessor to get the value of an optional (or required) field which has notbeen explicitly set always returns that field's default value.
· repeated: the fieldmay be repeated any number of times (including zero). The order of the repeatedvalues will be preserved in the protocol buffer. Think of repeated fields asdynamically sized arrays.
概括一下,required属性的字段是必须要给初值的,否则解析时会返回false;optional如果没给初值,PB会使用默认初值;repeated代表的是数组。注意:Google PB官方文档中特意指明,从Google内部目前使用情况来看,一个比较好的实践方式是只使用optional和repeated,而不使用required。这样可以达到最好的向前兼容性。
字段类型列表如下:(主要用到的是double,int64,int32,string,byte等)
.proto Type | Notes | C++ Type | Java Type | PythonType[2] |
double | double | double | float | |
float | float | float | float | |
int32 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. | int32 | int | int |
int64 | Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. | int64 | long | int/long[3] |
uint32 | Uses variable-length encoding. | uint32 | int[1] | int/long[3] |
uint64 | Uses variable-length encoding. | uint64 | long[1] | int/long[3] |
sint32 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. | int32 | int | int |
sint64 | Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. | int64 | long | int/long[3] |
fixed32 | Always four bytes. More efficient than uint32 if values are often greater than 228. | uint32 | int[1] | int |
fixed64 | Always eight bytes. More efficient than uint64 if values are often greater than 256. | uint64 | long[1] | int/long[3] |
sfixed32 | Always four bytes. | int32 | int | int |
sfixed64 | Always eight bytes. | int64 | long | int/long[3] |
bool | bool | boolean | boolean | |
string | A string must always contain UTF-8 encoded or 7-bit ASCII text. | string | String | str/unicode[4] |
bytes | May contain any arbitrary sequence of bytes. | string | ByteString | str |
字段ID的主要作用是做向前兼容。.proto文件扩充后,可以通过定义不重复的id,实现向前兼容性。
2. 将对象描述文件编译成代码
使用如下命令行“protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/person.proto”来编译.proto文件。注意:$SRC_DIR表示.proto文件夹路径;$DST_DIR表示输出的.cc和.h文件夹路径;”.”号表示使用当前路径。编译完成后,会在$DST_DIR下生成两个文件“person.pb.h”和“person.pb.cc”。
3. 使用PB序列化和反序列化对象
代码说话:
以上代码演示了在PB中使用文件和流数据为载体来序列化和反序列化对象(库和头文件路径请自行配置)。可以看到,使用PB承载对象还是比较容易的,代码编写也很简单。
下面是使用PB来承载一个较复杂的嵌套对象过程:
(1) addressbook.proto
(2) 参考以上编译成addressbook.pb.h和addressbook.pb.cc
(3) Writing A Message
(4) Reading A Message
注意以上用例中,枚举类型和嵌套结构的用法。
以上只是展示了PB的基础用法,其实PB还有许多高级用法:
Advanced Usage
Protocol buffers have uses that gobeyond simple accessors and serialization. Be sure to explore the C++API reference to see what else you can do with them.
One key feature provided by protocolmessage classes is reflection. You can iterate over the fields of amessage and manipulate their values without writing your code against anyspecific message type. One very useful way to use reflection is for convertingprotocol messages to and from other encodings, such as XML or JSON. A moreadvanced use of reflection might be to find differences between two messages ofthe same type, or to develop a sort of "regular expressions for protocolmessages" in which you can write expressions that match certain messagecontents. If you use your imagination, it's possible to apply Protocol Buffersto a much wider range of problems than you might initially expect!
Reflection is provided by the Message::Reflection interface.