Developer Guide

Welcome to the developer documentation for protocol buffers – a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.欢迎使用protocol buffers的开发人员文档,这是与语言无关,与平台无关,可扩展的序列化结构化数据的方法,可用于通信协议,数据存储等。

This documentation is aimed at Java, C++, or Python developers who want to use protocol buffers in their applications. This overview introduces protocol buffers and tells you what you need to do to get started – you can then go on to follow the tutorials or delve deeper into protocol buffer encoding. API reference documentation is also provided for all three languages, as well as language and style guides for writing .proto files.本文档面向希望在其应用程序中使用protocol buffers的Java,C ++或Python开发人员。本概述介绍了protocol buffers,并告诉您入门所需要做的工作–然后,您可以继续按照本教程进行操作,或深入研究protocol buffer encoding。还提供了所有三种语言的API参考文档,以及用于编写.proto文件的语言和样式指南。

What are protocol buffers?

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. You define how you want your data to be structured once, then you can use special generated source code to easily write and read your structured data to and from a variety of data streams and using a variety of languages. You can even update your data structure without breaking deployed programs that are compiled against the "old" format.Protocol buffers是一种用于序列化结构化数据的灵活,高效,自动化的机制–相比XML,其更小,更快,更简单。只用定义一次数据的结构,就可以使用生成的特殊源代码轻松地使用各种语言在各种数据流中写入和读取结构化数据。您甚至可以更新数据结构,而不会破坏已针对“旧”格式编译的已部署程序。

How do they work?

You specify how you want the information you're serializing to be structured by defining protocol buffer message types in .proto files. Each protocol buffer message is a small logical record of information, containing a series of name-value pairs. Here's a very basic example of a .proto file that defines a message containing information about a person:通过在.proto文件中定义protocol buffer消息类型,您可以指定要序列化的信息的结构。每个protocol buffer message都是一个小的逻辑信息记录,其中包含一系列名称/值对。这是.proto文件的一个非常基本的示例,该文件定义了一条包含有关人的信息的消息:

message Person {
    required string name = 1; 
    required int32 id = 2; 
    optional string email = 3; 
    enum PhoneType { 
        MOBILE = 0; 
        HOME = 1; 
        WORK = 2; 
    } 
    message PhoneNumber { 
        required string number = 1; 
        optional PhoneType type = 2 [default = HOME]; 
    } 
    repeated PhoneNumber phone = 4; 
}

As you can see, the message format is simple – each message type has one or more uniquely numbered fields, and each field has a name and a value type, where value types can be numbers (integer or floating-point), booleans, strings, raw bytes, or even (as in the example above) other protocol buffer message types, allowing you to structure your data hierarchically. You can specify optional fields, required fields, and repeated fields. You can find more information about writing .proto files in the Protocol Buffer Language Guide.如您所见,消息格式很简单–每个消息类型都有一个或多个唯一编号的字段,每个字段都有一个名称和一个值类型,其中值类型可以是数字(整数或浮点数),布尔值,字符串,原始字节或(如上例中所示)其他protocol buffer message类型,从而使您可以分层地构造数据。您可以指定可选字段,必填字段和重复字段。您可以在 Protocol Buffer Language Guide中找到有关写入.proto文件的更多信息。

Once you've defined your messages, you run the protocol buffer compiler for your application's language on your .proto file to generate data access classes. These provide simple accessors for each field (like name() and set_name()) as well as methods to serialize/parse the whole structure to/from raw bytes – so, for instance, if your chosen language is C++, running the compiler on the above example will generate a class called Person. You can then use this class in your application to populate, serialize, and retrieve Person protocol buffer messages. You might then write some code like this:定义消息后,就可以在.proto文件上为应用程序的语言运行protocol buffer 编译器,以生成数据访问类。这些为每个字段提供了简单的访问器(例如name()和set_name()),以及将整个结构序列化为原始字节或从原始字节中解析出整个结构的方法-例如,如果您选择的语言是C ++,则在上面的示例将生成一个名为Person的类。然后,您可以在应用程序中使用此类来填充,序列化和检索Person协议缓冲区消息。然后,您可以编写如下代码:

Person person; 
person.set_name("John Doe"); 
person.set_id(1234); 
person.set_email("jdoe@example.com"); 
fstream output("myfile", ios::out | ios::binary); 
person.SerializeToOstream(&output);

Then, later on, you could read your message back in:

fstream input("myfile", ios::in | ios::binary); 
Person person; 
person.ParseFromIstream(&input); 
cout << "Name: " << person.name() << endl; 
cout << "E-mail: " << person.email() << endl;

You can add new fields to your message formats without breaking backwards-compatibility; old binaries simply ignore the new field when parsing. So if you have a communications protocol that uses protocol buffers as its data format, you can extend your protocol without having to worry about breaking existing code.您可以在消息格式中添加新字段,而不会破坏向后兼容性。旧的二进制文件在解析时只会忽略新字段。因此,如果您有一个使用protocol buffers作为其数据格式的通信协议,则可以扩展协议而不必担心破坏现有代码。

You'll find a complete reference for using generated protocol buffer code in the API Reference section, and you can find out more about how protocol buffer messages are encoded in Protocol Buffer Encoding.您可以在 API Reference section 部分中找到有关使用生成的protocol buffers代码的完整参考,并且可以找到有关在Protocol Buffer Encoding中如何编码protocol buffers消息的更多信息。

Why not just use XML?

Protocol buffers have many advantages over XML for serializing structured data. Protocol buffers:

  • are simpler
  • are 3 to 10 times smaller
  • are 20 to 100 times faster
  • are less ambiguous
  • generate data access classes that are easier to use programmatically

For example, let's say you want to model a person with a name and an email. In XML, you need to do:

<person> <name>John Doe</name> <email>jdoe@example.com</email> </person>

while the corresponding protocol buffer message (in protocol buffer text format) is:

# Textual representation of a protocol buffer. 
# This is *not* the binary format used on the wire. 
person { name: "John Doe" email: "jdoe@example.com" }

When this message is encoded to the protocol buffer binary format (the text format above is just a convenient human-readable representation for debugging and editing), it would probably be 28 bytes long and take around 100-200 nanoseconds to parse. The XML version is at least 69 bytes if you remove whitespace, and would take around 5,000-10,000 nanoseconds to parse.此消息被编码为protocol buffer二进制格式(上面的文本格式只是调试和编辑时的一种方便人类阅读的表示形式)时,它的长度可能为28个字节,并且解析时间约为100-200纳秒。如果删除空白,则XML版本至少为69个字节,并且解析大约需要5,000-10,000纳秒。

Also, manipulating a protocol buffer is much easier:

cout << "Name: " << person.name() << endl;   
cout << "E-mail: " << person.email() << endl;

Whereas with XML you would have to do something like:

cout << "Name: "<< person.getElementsByTagName("name")->item(0)->innerText() << endl;
cout << "E-mail: "<< person.getElementsByTagName("email")->item(0)->innerText() << endl;

However, protocol buffers are not always a better solution than XML – for instance, protocol buffers would not be a good way to model a text-based document with markup (e.g. HTML), since you cannot easily interleave structure with text. In addition, XML is human-readable and human-editable; protocol buffers, at least in their native format, are not. XML is also – to some extent – self-describing. A protocol buffer is only meaningful if you have the message definition (the .proto file).但是,protocol buffer并不总是比XML更好的解决方案-例如,protocol buffer不是用标记(例如HTML)对基于文本的文档建模的好方法,因为您不能轻易地使结构与文本交错。另外,XML是人类可读和可编辑的。protocol buffer(至少以其原生格式)不是。 XML在某种程度上也是自描述的。仅当您具有消息定义(.proto文件)时,protocol buffer才有意义。

Sounds like the solution for me! How do I get started?

Download the package – this contains the complete source code for the Java, Python, and C++ protocol buffer compilers, as well as the classes you need for I/O and testing. To build and install your compiler, follow the instructions in the README.

Download the package 其中包含Java,Python和C ++ protocol buffer compilers的完整源代码,以及I/O 和测试所需的类。要生成和安装编译器,请按照自述文件中的说明进行操作。

Once you're all set, try following the tutorial for your chosen language – this will step you through creating a simple application that uses protocol buffers.

一切准备就绪后,请尝试按照所选语言的教程进行操作–这将逐步指导您创建一个使用协议缓冲区的简单应用程序

Introducing proto3

Our most recent version 3 release introduces a new language version - Protocol Buffers language version 3 (aka proto3), as well as some new features in our existing language version (aka proto2). Proto3 simplifies the protocol buffer language, both for ease of use and to make it available in a wider range of programming languages: our current release lets you generate protocol buffer code in Java, C++, Python, Java Lite, Ruby, JavaScript, Objective-C, and C#. In addition you can generate proto3 code for Go using the latest Go protoc plugin, available from the golang/protobuf Github repository. More languages are in the pipeline.我们最新的版本3引入了新的语言版本-Protocol Buffers 3(aka proto3),以及我们现有语言版本(aka proto2)的一些新功能。 Proto3简化了protocol buffer language,既易于使用,又使其可在更广泛的编程语言中使用:我们的当前版本可让您生成Java,C ++,Python,Java Lite,Ruby,JavaScript,Objective- C和C#。此外,您可以使用最新的Go protoc插件(可从 golang/protobuf Github存储库中获得)为Go生成proto3代码。更多语言正在准备中。

Note that the two language version APIs are not completely compatible. To avoid inconvenience to existing users, we will continue to support the previous language version in new protocol buffers releases.请注意,两种语言版本的API并不完全兼容。为避免给现有用户带来不便,我们将在新的协议缓冲区版本中继续支持以前的语言版本。

You can see the major differences from the current default version in the release notes and learn about proto3 syntax in the Proto3 Language Guide. Full documentation for proto3 is coming soon!您可以在release notes 中看到与当前默认版本的主要差异,并在release notes 中了解proto3语法。关于proto3的完整文档即将发布!

(If the names proto2 and proto3 seem a little confusing, it's because when we originally open-sourced protocol buffers it was actually Google's second version of the language – also known as proto2. This is also why our open source version number started from v2.0.0).(如果名称proto2和proto3看起来有些混乱,那是因为当我们最初开放源代码时,它实际上是Google的第二种语言版本,也称为proto2。这也是为什么我们的开放源代码版本从v2开始的原因)

A bit of history

Protocol buffers were initially developed at Google to deal with an index server request/response protocol. Prior to protocol buffers, there was a format for requests and responses that used hand marshalling/unmarshalling of requests and responses, and that supported a number of versions of the protocol. This resulted in some very ugly code, like:Protocol buffers最初是由Google开发的,用于处理索引服务器的请求/响应协议。在Protocol buffers之前,存在用于请求和响应的格式,该格式使用请求和响应的手动编组/解组,并且支持多种协议版本。这导致了一些非常难看的代码,例如:

 if (version == 3) {
    ...  
 } else if (version > 4) {
    if (version == 5) {
      ...    
    }    
 ...  
 }

Explicitly formatted protocols also complicated the rollout of new protocol versions, because developers had to make sure that all servers between the originator of the request and the actual server handling the request understood the new protocol before they could flip a switch to start using the new protocol.明确格式化的协议也使新协议版本的推出变得复杂,因为开发人员必须确保在请求的始发者和处理请求的实际服务器之间的所有服务器都可以理解新协议,然后才能翻转开关以开始使用新协议。

Protocol buffers were designed to solve many of these problems:协议缓冲区旨在解决许多这些问题:

  • New fields could be easily introduced, and intermediate servers that didn't need to inspect the data could simply parse it and pass through the data without needing to know about all the fields.可以很容易地引入新的字段,不需要检查数据的中间服务器可以简单地解析它并传递数据,而无需了解所有字段。
  • Formats were more self-describing, and could be dealt with from a variety of languages (C++, Java, etc.)格式更具描述性,可以使用多种语言(C ++,Java等)处理。

However, users still needed to hand-write their own parsing code.但是,用户仍然需要手写自己的解析代码。

As the system evolved, it acquired a number of other features and uses:随着系统的发展,它获得了许多其他功能并使用:

  • Automatically-generated serialization and deserialization code avoided the need for hand parsing.自动生成的序列化和反序列化代码避免了手动解析的需要。
  • In addition to being used for short-lived RPC (Remote Procedure Call) requests, people started to use protocol buffers as a handy self-describing format for storing data persistently (for example, in Bigtable).人们除了将其用于短暂的RPC(远程过程调用)请求外,还开始使用协议缓冲区作为一种方便的自描述格式来持久存储数据(例如,在Bigtable中)。
  • Server RPC interfaces started to be declared as part of protocol files, with the protocol compiler generating stub classes that users could override with actual implementations of the server's interface.服务器RPC接口开始被声明为协议文件的一部分,协议编译器生成存根类,用户可以在服务器接口的实际实现中覆盖它们。

Protocol buffers are now Google's lingua franca for data – at time of writing, there are 306,747 different message types defined in the Google code tree across 348,952 .proto files. They're used both in RPC systems and for persistent storage of data in a variety of storage systems.协议缓冲区现在已成为Google的通用语言–撰写本文时,Google代码树中的348,952个.proto文件中定义了306,747种不同的消息类型。它们既用于RPC系统中,又用于在各种存储系统中持久存储数据。

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值