Protocol Buffers 3.0

最新推荐文章于 2023-07-08 19:03:24 发布

那块代码没问题

最新推荐文章于 2023-07-08 19:03:24 发布

阅读量442

点赞数

分类专栏： Protocol Buffers 文章标签： Protocol Buffers

本文链接：https://blog.csdn.net/h291850336/article/details/91878139

版权

Protocol Buffers 专栏收录该内容

2 篇文章 0 订阅

订阅专栏

定义：

一种结构化数据的数据存储格式。(类似于xml, json)

作用：

通过将结构化的数据 进行串行化(序列化)，从而实现 数据存储/rpc数据交换 的功能

序列化：将数据结构或对象转换成二进制的过程

饭序列化：将在序列化过程中所生成的二进制串转换成数据结构或对象的过程。

特点：

相对于xml,json,protocol buffer有如下特点：

应用场景：

传输数据量大 & 网络环境不稳定的数据存储、rpc数据交换的需求场景。

序列化原理解析：

序列化本质：对数据进行编码 + 存储

1. 序列化速度快：

编码/解码方式简单(只需简单的数学运算= 位移等)

采用pb自身的框架代码和编译器共同完成

2. 序列化后体积小(数据压缩效果好)

采用了独特的编码方式，如Varint, Zigzag等

采用T - L - V 的数据存储方式,

即 Tag - Length - Value，标识 - 长度 - 字段值存储方式,减少了分隔符的使用，数据存储得紧凑

protocol buffer语言来组织你的protocol buffer数据，包括.proto文件的语法规则以及如何通过.proto文件来生成数据访问类代码。

Defining A Message Type（定义一个消息类型）

syntax = "proto3";

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

语法说明(syntax)前只能是空行或者注释
每个字段由字段限制、字段类型、字段名和编号四部分组成

Specifying Field Types（指定字段类型）

在上面的例子中，该消息定义了三个字段，两个int32类型和一个string类型的字段

Assigning Tags（赋予编号）

消息中的每一个字段都有一个独一无二的数值类型的编号。1到15使用一个字节编码，16到2047使用2个字节编码，所以应该将编号1到15留给频繁使用的字段。
可以指定的最小的编号为1，最大为2^{29}-1或536，870，911。但是不能使用19000到19999之间的值，这些值是预留给protocol buffer的。

Specifying Field Rules（指定字段限制）

required:必须赋值的字段
optional:可有可无的字段
repeated:可重复字段(变长字段)

Adding More Message Types（添加更多消息类型）

一个.proto文件可以定义多个消息类型：

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

message SearchResponse {
 ...
}

Adding Comments（添加注释）

.proto文件也使用C/C++风格的注释语法//

message SearchRequest {
  string query = 1;
  int32 page_number = 2;  // Which page number do we want?
  int32 result_per_page = 3;  // Number of results to return per page.
}

Reserved Fields（预留字段）

如果消息的字段被移除或注释掉，但是使用者可能重复使用字段编码，就有可能导致例如数据损坏、隐私漏洞等问题。一种避免此类问题的方法就是指明这些删除的字段是保留的。如果有用户使用这些字段的编号，protocol buffer编译器会发出告警。

message Foo {
  reserved 2, 15, 9 to 11;
  reserved "foo", "bar";
}

What‘s Generated From Your .proto？（编译`.proto`文件）

对于C++，每一个.proto文件经过编译之后都会对应的生成一个.h和一个.cc文件。

Scalar Value Types（类型对照表）

.proto Type	Notes	C++ Type
double	double	double
float	float	float
int32	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.	int32
int64	Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.	int64
uint32	Uses variable-length encoding.	uint32
uint64	Uses variable-length encoding.	uint64
sint32	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.	int32
sint64	Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.	int64
fixed32	Always four bytes. More efficient than uint32 if values are often greater than 2^28	uint32
fixed64	Always eight bytes. More efficient than uint64 if values are often greater than 2^56	uint64
sfixed32	Always four bytes.	int32
sfixed64	Always eight bytes.	int64
bool	bool	boolean
string	A string must always contain UTF-8 encoded or 7-bit ASCII text.	string
bytes	May contain any arbitrary sequence of bytes.	string

Default Values（缺省值）

如果没有指定默认值,则会使用系统默认值，对于string默认值为空字符串，对于bool默认值为false，对于数值类型默认值为0，对于enum默认值为定义中的第一个元素，对于repeated默认值为空。

Enumerations（枚举）

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
  enum Corpus {
    UNIVERSAL = 0;
    WEB = 1;
    IMAGES = 2;
    LOCAL = 3;
    NEWS = 4;
    PRODUCTS = 5;
    VIDEO = 6;
  }
  Corpus corpus = 4;
}

通过设置可选参数allow_alias为true，就可以在枚举结构中使用别名(两个值元素值相同)

enum EnumAllowingAlias {
  option allow_alias = true;
  UNKNOWN = 0;
  STARTED = 1;
  RUNNING = 1;
}
enum EnumNotAllowingAlias {
  UNKNOWN = 0;
  STARTED = 1;
  // RUNNING = 1;  // Uncommenting this line will cause a compile error inside Google and a warning message outside.
}

由于枚举值采用varint编码，所以为了提高效率，不建议枚举值取负数。这些枚举值可以在其他消息定义中重复使用。

Using Other Message Types（使用其他消息类型）

可以使用一个消息的定义作为另一个消息的字段类型。

message SearchResponse {
  repeated Result results = 1;
}

message Result {
  string url = 1;
  string title = 2;
  repeated string snippets = 3;
}

Importing Definitions（导入定义）

就像C++的头文件一样，你还可以导入其他的.proto文件

import "myproject/other_protos.proto";

如果想要移动一个.proto文件，但是又不想修改项目中import部分的代码，可以在文件原先位置留一个空.proto文件，然后使用import public导入文件移动后的新位置：

// new.proto
// All definitions are moved here

// old.proto
// This is the proto that all clients are importing.
import public "new.proto";
import "other.proto";

// client.proto
import "old.proto";
// You use definitions from old.proto and new.proto, but not other.proto

Nested Types（嵌套类型）

在protocol中可以定义如下的嵌套类型

message SearchResponse {
  message Result {
    string url = 1;
    string title = 2;
    repeated string snippets = 3;
  }
  repeated Result results = 1;
}

如果在另外一个消息中需要使用Result定义，则可以通过Parent.Type来使用。

message SomeOtherMessage {
  SearchResponse.Result result = 1;
}

protocol支持更深层次的嵌套和分组嵌套，但是为了结构清晰起见，不建议使用过深层次的嵌套。

message Outer {                  // Level 0
  message MiddleAA {  // Level 1
    message Inner {   // Level 2
      int64 ival = 1;
      bool  booly = 2;
    }
  }
  message MiddleBB {  // Level 1
    message Inner {   // Level 2
      int32 ival = 1;
      bool  booly = 2;
    }
  }

Updating A Message Type（更新一个数据类型）

在实际的开发中会存在这样一种应用场景，既消息格式因为某些需求的变化而不得不进行必要的升级，但是有些使用原有消息格式的应用程序暂时又不能被立刻升级，这便要求我们在升级消息格式时要遵守一定的规则，从而可以保证基于新老消息格式的新老程序同时运行。规则如下：

不要修改已经存在字段的标签号。
任何新添加的字段必须是optional和repeated限定符，否则无法保证新老程序在互相传递消息时的消息兼容性。
在原有的消息中，不能移除已经存在的required字段，optional和repeated类型的字段可以被移除，但是他们之前使用的标签号必须被保留，不能被新的字段重用。
int32、uint32、int64、uint64和bool等类型之间是兼容的，sint32和sint64是兼容的，string和bytes是兼容的，fixed32和sfixed32，以及fixed64和sfixed64之间是兼容的，这意味着如果想修改原有字段的类型时，为了保证兼容性，只能将其修改为与其原有类型兼容的类型，否则就将打破新老消息格式的兼容性。
optional和repeated限定符也是相互兼容的。

Any（任意消息类型）

Any类型是一种不需要在.proto文件中定义就可以直接使用的消息类型，使用前import google/protobuf/any.proto文件即可。

import  "google/protobuf/any.proto";

message ErrorStatus  {
  string message =  1;
  repeated google.protobuf.Any details =  2;
}

C++使用PackFrom()和UnpackTo()方法来打包和解包Any类型消息。

// Storing an arbitrary message type in Any.
NetworkErrorDetails details =  ...;
ErrorStatus status;
status.add_details()->PackFrom(details);

// Reading an arbitrary message from Any.
ErrorStatus status =  ...;
for  (const  Any& detail : status.details())  {  if  (detail.Is<NetworkErrorDetails>())  {    NetworkErrorDetails network_error;
    detail.UnpackTo(&network_error);    ... processing network_error ...  }
}

Oneof（其中一个字段类型）

有点类似C++中的联合，就是消息中的多个字段类型在同一时刻只有一个字段会被使用，使用case()或WhichOneof()方法来检测哪个字段被使用了。

Using Oneof（使用Oneof）

message SampleMessage  {
  oneof test_oneof {
    string name =  4;
    SubMessage sub_message =  9;
  }
}

你可以添加除repeated外任意类型的字段到Oneof定义中

Oneof Features（Oneof特性）

oneof字段只有最后被设置的字段才有效，即后面的set操作会覆盖前面的set操作

SampleMessage message;
message.set_name("name");
CHECK(message.has_name());
message.mutable_sub_message();   // Will clear name field.
CHECK(!message.has_name());

oneof不可以是repeated的
反射API可以作用于oneof字段

如果使用C++要防止内存泄露，即后面的set操作会覆盖之前的set操作，导致前面设置的字段对象发生析构，要注意字段对象的指针操作

SampleMessage message;
SubMessage* sub_message = message.mutable_sub_message();
message.set_name("name");      // Will delete sub_message
sub_message->set_...            // Crashes her

如果使用C++的Swap()方法交换两条oneof消息，两条消息都不会保存之前的字段

SampleMessage msg1;
msg1.set_name("name");
SampleMessage msg2;
msg2.mutable_sub_message();
msg1.swap(&msg2);
CHECK(msg1.has_sub_message());
CHECK(msg2.has_name());

Backwards-compatibility issues（向后兼容）

添加或删除oneof字段的时候要注意，如果检测到oneof字段的返回值是None/NOT_SET，这意味着oneof没有被设置或者设置了一个不同版本的oneof的字段，但是没有办法能够区分这两种情况，因为没有办法确认一个未知的字段是否是一个oneof的成员。

Tag Reuse Issues（编号复用问题）

删除或添加字段到oneof：在消息序列化或解析后会丢失一些信息，一些字段将被清空
删除一个字段然后重新添加：在消息序列化或解析后会清除当前设置的oneof字段
分割或合并字段：同普通的删除字段操作

Maps（表映射）

protocol buffers提供了简介的语法来实现map类型：

map<key_type, value_type> map_field = N;

key_type可以是除浮点指针或bytes外的其他基本类型，value_type可以是任意类型

map<string,  Project> projects =  3;

Map的字段不可以是重复的(repeated)
线性顺序和map值的的迭代顺序是未定义的，所以不能期待map的元素是有序的
maps可以通过key来排序，数值类型的key通过比较数值进行排序
线性解析或者合并的时候，如果出现重复的key值，最后一个key将被使用。从文本格式来解析map，如果出现重复key值则解析失败。

Backwards compatibility（向后兼容）

map语法下面的表达方式在线性上是等价的，所以即使protocol buffers没有实现maps数据结构也不会影响数据的处理：

message MapFieldEntry  {
  key_type key =  1;
  value_type value =  2;
}
repeated MapFieldEntry map_field = N;

包

类似C++的命名空间，用来防止名称冲突

package foo.bar;
message Open { ... }

你可以使用包说明符来定义你的消息字段：

message Foo {
  ...
  foo.bar.Open open = 1;
  ...
}

定义服务

如果想在RPC系统中使用消息类型，就需要在.proto文件中定义RPC服务接口，然后使用编译器生成对应语言的存根。

service SearchService {
  rpc Search (SearchRequest) returns (SearchResponse);
}

JSON映射

Proto3支持JSON格式的编码。编码后的JSON数据的如果没有值或值为空，解析时protocol buffer将会使用默认值，在对JSON编码时可以节省空间。

proto3	JSON	JSON example	Notes
message	object	{"fBar": v, "g": null, …}	Generates JSON objects. Message field names are mapped to lowerCamelCase and become JSON object keys. `null` is accepted and treated as the default value of the corresponding field type.
enum	string	"FOO_BAR"	The name of the enum value as specified in proto is used.
map< K,V>	object	{"k": v, …}	All keys are converted to strings.
repeated V	array	[v, …]	`null` is accepted as the empty list [].
bool	true, false	true, false
string	string	"Hello World!"
bytes	base64 string	"YWJjMTIzIT8kKiYoKSctPUB+"
int32, fixed32, uint32	number	1, -10, 0	JSON value will be a decimal number. Either numbers or strings are accepted.
int64, fixed64, uint64	string	"1", "-10"	JSON value will be a decimal string. Either numbers or strings are accepted.
float, double	number	1.1, -10.0, 0, "NaN", "Infinity"	JSON value will be a number or one of the special string values "NaN", "Infinity", and "-Infinity". Either numbers or strings are accepted. Exponent notation is also accepted.
Any	object	{"@type": "url", "f": v, … }	If the Any contains a value that has a special JSON mapping, it will be converted as follows: `{"@type": xxx,<wbr style="box-sizing: inherit;"> "value": yyy}`. Otherwise, the value will be converted into a JSON object, and the `"@type"` field will be inserted to indicate the actual data type.
Timestamp	string	"1972-01-01T10:00:20.021Z"	Uses RFC 3339, where generated output will always be Z-normalized and uses 0, 3, 6 or 9 fractional digits.
Duration	string	"1.000340012s", "1s"	Generated output always contains 0, 3, 6, or 9 fractional digits, depending on required precision. Accepted are any fractional digits (also none) as long as they fit into nano-seconds precision.
Struct	object	{ … }	Any JSON object. See struct.proto.
Wrapper types	various types	2, "2", "foo", true, "true", null, 0, …	Wrappers use the same representation in JSON as the wrapped primitive type, except that `null` is allowed and preserved during data conversion and transfer.
FieldMask	string	"f.fooBar,h"	See fieldmask.proto.
ListValue	array	[foo, bar, …]
Value	value		Any JSON value
NullValue	null		JSON null

选项

Protocol Buffer允许我们在.proto文件中定义一些常用的选项，这样可以指示Protocol Buffer编译器帮助我们生成更为匹配的目标语言代码。Protocol Buffer内置的选项被分为以下三个级别：

文件级别，这样的选项将影响当前文件中定义的所有消息和枚举。
消息级别，这样的选项仅影响某个消息及其包含的所有字段。
字段级别，这样的选项仅仅响应与其相关的字段。

下面将给出一些常用的Protocol Buffer选项。

optimize_for(文件选项)：可以设置的值有SPEED、CODE_SIZE 或 LITE_RUNTIME，不同的选项会以下述方式影响C++代码的生成（option optimize_for = CODE_SIZE;）。

SPEED (default): protocol buffer编译器将会生成序列化,语法分析和其他高效操作消息类型的方式.这也是最高的优化选项.确定是生成的代码比较大.
CODE_SIZE: protocol buffer编译器将会生成最小的类,确定是比SPEED运行要慢
LITE_RUNTIME: protocol buffer编译器将会生成只依赖"lite" runtime library (libprotobuf-lite instead of libprotobuf)的类. lite运行时库比整个库更小但是删除了例如descriptors 和 reflection等特性. 这个选项通常用于手机平台的优化.

cc_enable_arenas(文件选项)：生成的C++代码启用arena allocation内存管理
deprecated(文件选项)：

那块代码没问题

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Protocol Buffers 3.0

定义：一种结构化数据的数据存储格式。(类似于xml, json)作用：通过将结构化的数据进行串行化(序列化)，从而实现数据存储/rpc数据交换的功能序列化：将数据结构或对象转换成二进制的过程饭序列化：将在序列化过程中所生成的二进制串转换成数据结构或对象的过程。特点：相对于xml,json,protocol buffer有如下特点：...
复制链接

扫一扫