Techniques 技巧
注:这是本人的翻译,可能不准确,可能有错误,但是基本上可以理解,希望能对大家有所帮助!(转载请注明出处:本文来自learnhard的博客:http://www.codelast.com/& http://blog.csdn.net/learnhard/)
This page describes some commonly-used design patterns for dealing with Protocol Buffers. You can also send design and usage questions to the Protocol Buffers discussion group.
l 将多个消息转化为流
l 大数据集
l 联合类型
l 自描述的消息
本文描述了处理Protocol Buffers的时候一些常用的设计模式。你也可以向Protocol Buffers讨论组(Protocol Buffers discussion group)发送设计和使用方面的问题寻求解答。
Streaming Multiple Messages 将多个消息转化为流
If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. The Protocol Buffer wire format is not self-delimiting, so protocol buffer parsers cannot determine where a message ends on their own. The easiest way to solve this problem is to write the size of each message before you write the message itself. When you read the messages back in, you read the size, then read the bytes into a separate buffer, then parse from that buffer. (If you want to avoid copying bytes to a separate buffer, check out the CodedInputStream class (in both C++ and Java) which can be told to limit reads to a certain number of bytes.)
如果你想将多个消息写入一个文件或流(stream)中,那么是由你来记录一个消息的终点以及另一个消息的起点的。Protocol Buffer数据传输格式不是自我限定的(self-delimiting),所以protocol buffer解析器无法自己决定一个消息结束于何处。解决这个问题的最简单的方法就是:在写入每一个消息之前,先写入消息的大小。当你读取消息的时候,先读取消息大小,然后将指定的字节数读入一个独立的缓冲区中,然后再解析缓冲区里的东西。如果你不想将数据拷贝到一个独立的缓冲区中,请查看CodedInputStream类(C++和Java都可用)的使用方法——你可以用它来限制只读取指定字节的数据。
(转载请注明出处:本文来自learnhard的博客:http://www.codelast.com/& http://blog.csdn.net/learnhard/)
Large Data Sets 大数据集
Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.
Protocol Buffers不是设计来处理大消息的。根据一般经验,如果你要处理的单条消息大于1M,那就是采取其他策略的时候了。
That said, Protocol Buffers are great for handling individual messages within a large data set. Usually, large data sets are really just a collection of small pieces, where each small piece may be a structured piece of data. Even though Protocol Buffers cannot handle the entire set at once, using Protocol Buffers to encode each piece greatly simplifies your problem: now all you need is to handle a set of byte strings rather than a set of structures.
也就是说,Protocol Buffers非常适合于处理一个大数据集内有多个单独的消息。通常,大数据集只是许多小块数据的集合,每一小块都是一块结构化的数据。即使是这样,Protocol Buffers也不能马上处理整个数据集,使用Protocol Buffers来编码每一块数据可以极大地简化你的问题:现在你所需要的只是处理一组字符串,而不是一组结构体了。
Protocol Buffers do not include any built-in support for large data sets because different situations call for different solutions. Sometimes a simple list of records will do while other times you may want something more like a database. Each solution should be developed as a separate library, so that only those who need it need to pay the costs.
Protocol Buffers没有内置任何对大数据集的支持,因为不同的情况需要不同的解决方案。
有时,一个简单的记录列表就解决问题了,而在某些情况下,你可能更想要的是类似于数据库的东西。每个解决方案都应该作为一个独立的库来开发,所以只有需要它的人才需要付出代价。
Union Types 联合类型
You may sometimes want to send a message that could be one of several different types. However, protocol buffer parsers cannot necessarily determine the type of a message based on the contents alone. So how do you make sure that the recipient application knows how to decode your message? One solution is to create a wrapper message that has one optional field for each possible message type.
有时,你可能想发送一个消息,它的类型可以是几种不同的类型之一。然而,protocol buffer解析器无法仅凭消息内容来决定消息的类型。所以,你如何确保接收方应用程序能知道怎么解析消息呢?有一个解决方案是:创建一个封装的消息,其含有N个optional的字段,每一个字段对应一种可能的消息类型。
(转载请注明出处:本文来自learnhard的博客:http://www.codelast.com/& http://blog.csdn.net/learnhard/)
For example, if you have message types Foo, Bar, and Baz, you can combine them with a type like:
例如,如果你有消息类型Foo,Bar和Baz,那么你可以将它们与类型结合起来,像这样:
message OneMessage {
// One of the following will be filled in.
optional Foo foo = 1;
optional Bar bar = 2;
optional Baz baz = 3;
}
You may also want to have an enum field that identifies which message is filled in, so that you can switch on it:
你也可能想这样做:添加一个枚举字段来标识是哪个消息被填充了,这样的话你就可以对它使用switch(来进行不同的处理):
message OneMessage {
enum Type { FOO = 1; BAR = 2; BAZ = 3; }
// Identifies which field is filled in.
required Type type = 1;
// One of the following will be filled in.
optional Foo foo = 2;
optional Bar bar = 3;
optional Baz baz = 4;
}
If you have a very large number of possible types, listing every one of them in your container type may be unwieldy. Instead, you should consider using extensions:
如果你有大量“可能的类型”,那么在你的容器类型中将它们一一列举出来可能非常困难。在这种情况下,你应该考虑使用扩展(extensions):
message OneMessage {
extensions 100 to max;
}
// Elsewhere…
extend OneMessage {
optional Foo foo_ext = 100;
optional Bar bar_ext = 101;
optional Baz baz_ext = 102;
}
Note that you can use the ListFields reflection method (in C++, Java, and Python) to get a list of all fields present in the message, including extensions. You might use this as part of a scheme for registering handlers for diverse message types.
注意:你可以使用反射函数ListFields(C++,Java和Python都可用)来获取一个消息中的所有字段的列表(包括扩展)。在实现一个为不同消息类型分别注册句柄的技术方案的时候,你可能用得上这个功能。
Self-describing Messages 自描述的消息
Protocol Buffers do not contain descriptions of their own types. Thus, given only a raw message without the corresponding .proto file defining its type, it is difficult to extract any useful data.
However, note that the contents of a .proto file can itself be represented using protocol buffers. The file src/google/protobuf/descriptor.protoin the source code package defines the message types involved. protoc can output a FileDescriptorSet – which represents a set of .proto files – using the –descriptor_set_out option. With this, you could define a self-describing protocol message like so:
Protocol Buffers不包含自我类型描述的信息。因此,如果只提供原始消息,而不提供对应的.proto文件,你将很难从中提取出任何有用的数据。
然而,请注意:一个.proto文件的内容可以使用protocol buffers来描述。通过使用–descriptor_set_out选项,源代码包中的src/google/protobuf/descriptor.proto文件定义了相关的消息类型。protoc编译器可以输出一个FileDescriptorSet——这个集合表示一系列的.proto文件。利用它,你可以像这样定义一个自描述的协议消息:
message SelfDescribingMessage {
// Set of .proto files which define the type.
required FileDescriptorSet proto_files = 1;
// Name of the message type. Must be defined by one of the files in
// proto_files.
required string type_name = 2;
// The message data.
required bytes message_data = 3;
}
By using classes like DynamicMessage (available in C++ and Java), you can then write tools which can manipulate SelfDescribingMessages.
All that said, the reason that this functionality is not included in the Protocol Buffer library is because we have never had a use for it inside Google.
通过使用像DynamicMessage这样的类(C++和Java都可用),你可以编写出操作SelfDescribingMessage的工具。
总之,Protocol Buffer库之所以没有包含这个特性,是因为我们在Google里还从没有需要用它的机会。
(转载请注明出处:本文来自learnhard的博客:http://www.codelast.com/& http://blog.csdn.net/learnhard/)