Usage of Protocol Buffer

Usage of Protocol Buffer

m

1         Introduction

Protocol buffers supports a language-neutral, platform-neutral, extensible way of serializing structured data for use in communications protocols, data storage, and more.

Protocol buffers are a flexible, efficient, automated mechanism for serializing structured data – think XML, but smaller, faster, and simpler. However, protocol buffer will save data in binary mode, it’s not readable. XML is readable. XML is a standard of W3C, protocol buffer is not a international standard.

Note: If you don’t care if the data is readable, you can use protocol buffer. The biggest merit is that you don’t need to write the parser code and test it.

NOTE: If you want to start protocol buffer quickly, you only need to read part: language guide 3.1-3.3 and part 4:C++ Tutorials.

2         What you need to do when using protocol buffer?

1)      Download protocol buffer and install it;

2)      Write the specification file to define your structured information in .proto files.

3)      Compile the .proto files into your language, C++, Java or Python.

4)      Using the generated classes in your code.

3         Language Guide (Writing .protoc file)

This guide describes how to use the protocol buffer language to structure your protocol buffer data, including .proto file syntax and how to generate data access classes from your .proto files.

3.1      Defining A Message Type

In protocol buffer, message type is like a class or structure. It is the basic component of protoc file. You can define your message as follow:

message name{

       Message var1;   # field1

       Message var2;   # field2

       ……

}

 

Message Variable Format:

Field Rule Field Type var_name = Unique numbered Tag [default = 10] ;   //Comments

 

Field Rule defines how to deal with this field in protocol buffer.

    * required : a well-formed message must have exactly one of this field.

    * optional : a well-formed message can have zero or one of this field (but not more than one).

* repeated : this field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved.

 

Note: Required Is Forever. If you use required for one field, you don’t have the change to modify the field any more. Thus pay more attention on required, using optional and repeated instead except that you are sure that the field can’t be change any more.

 

Field Type defines the variable type. Field type is language-neutral, platform-neutral and is will be map to related actual variable type when you compile protoc file.

.proto Type

Notes

C++ Type

Java Type

double

 

double

double

float

 

float

float

int32

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead.

int32

int

int64

Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead.

int64

long

uint32

Uses variable-length encoding.

uint32

int[1]

uint64

Uses variable-length encoding.

uint64

long[1]

sint32

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s.

int32

int

sint64

Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s.

int64

long

fixed32

Always four bytes. More efficient than uint32 if values are often greater than 228 .

uint32

int[1]

fixed64

Always eight bytes. More efficient than uint64 if values are often greater than 256 .

uint64

long[1]

sfixed32

Always four bytes.

int32

int

sfixed64

Always eight bytes.

int64

long

bool

 

bool

boolean

string

A string must always contain UTF-8 encoded or 7-bit ASCII text.

string

String

bytes

May contain any arbitrary sequence of bytes.

string

ByteString

 

Unique Numbered Tag: Each field in the message definition has a unique numbered tag . These tags are used to identify your fields in the message binary format, and should not be changed once your message type is in use.

 

Note:

1)      Tags with values in the range 1 through 15 take one byte to encode . Tags in the range 16 through 2047 take two bytes . So you should reserve the tags 1 through 15 for very frequently occurring message elements and leave some room for frequently occurring elements that might be added in the future.

2)      Numbers 19000 though 19999 are reserved for the Protocol Buffers implementation, not use them.

 

Protocol buffer also support Enum, Nested message, Package, import and Service (RPC--Remote Procedure Call). We will learn them from example.

3.2      Example

package EXAM;        // define a package, like namespace

import "myproject/other_protos.proto";   // import other protoc file, like include

message SearchRequest {              // define a message

  required string query = 1;             // define message fields

  optional int32 page_number = 2;

  optional int32 result_per_page = 3 [default = 10];

  enum Corpus {                    // define an enum a

    UNIVERSAL = 0;

    WEB = 1;

    IMAGES = 2;

    LOCAL = 3;

    NEWS = 4;

    PRODUCTS = 5;

    VIDEO = 6;

  }

optional Corpus corpus = 4 [default = UNIVERSAL];   // use an option value

 

message Result {                // nested message

  required string url = 1;

  optional string title = 2;

  repeated string snippets = 3;

}

repeated Result result = 5;

}

 

message other{           // define multi messages in same file

       required string url = 1;

       …..

}

========================================================

3.3      Generating Your Classes

To generate the Java, Python, or C++ code you need to work with the message types defined in a .proto file, you need to run the protocol buffer compiler protoc on the .proto . Example:

protoc --proto_path=IMPORT_PATH



 --cpp_out=DST_DIR



 --java_out=DST_DIR



 --python_out=DST_DIR



 path/to/file
.p
roto


 

This command will compile protoc file into C++/Java/Python code.

Using “protoc –help” for detail.

3.4      Extensions

Extensions let you declare that a range of field numbers, which are available for third-party extensions, in a message. Other people can then declare new fields for your message type with those numeric tags in their own .proto files without having to edit the original file. Let's look at an example:

message Foo {

  // ...

  extensions 100 to 199;

}

 

Other users can now add new fields to Foo in their own .proto files that import your .proto , using tags within your specified range – for example:

extend Foo {

  optional int32 bar = 126;

}

 

However, the way you access extension fields in your application code is slightly different to accessing regular fields – your generated data access code has special accessors for working with extensions. So, for example, here's how you set the value of bar in C++:

 

Foo foo;

foo.SetExtension(bar, 15);

3.5      Rules for Updating A Message Type

l         Don't change the numeric tags for any existing fields.

l         Any new fields that you add should be optional or repeated.

l         Non-required fields can be removed, as long as the tag number is not used again in your updated message type.

l         A non-required field can be converted to an extension and vice versa, as long as the type and number stay the same.

l         int32, uint32, int64, uint64, and bool are all compatible – this means you can change a field from one of these types to another without breaking forwards- or backwards-compatibility.

l         sint32 and sint64 are compatible with each other but are not compatible with the other integer types.

l         string and bytes are compatible as long as the bytes are valid UTF-8.

l         Embedded messages are compatible with bytes if the bytes contain an encoded version of the message.

l         fixed32 is compatible with sfixed32, and fixed64 with sfixed64.

 

3.6      FAQ

Ø         Can I change or the generated code?

Don’t to that! Protocol buffer classes are basically dumb data holders (like structs in C++). If you want to add richer behavior to a generated class, the best way to do this is to wrap the generated protocol buffer class in an application-specific class. You should never add behavior to the generated classes by inheriting from them . This will break internal mechanisms and is not good object-oriented practice anyway.

Ø         How to write multi messages into one file (buffer)?

If you want to write multiple messages to a single file or stream, it is up to you to keep track of where one message ends and the next begins. In other word, you need to write/read the message serialization string to/from file correctly.

Ø         Can protocol buffer deal with Large Data Sets?

Protocol Buffers are not designed to handle large messages. As a general rule of thumb, if you are dealing in messages larger than a megabyte each, it may be time to consider an alternate strategy.

Ø         How to optimize protocol generated code?

1) Using option, Such as “option optimize_for = LITE_RUNTIME /CODE_SIZE;”. 2) Reuse message objects when possible.

 

4         C++ Tutorials

Common API

Standard Message Methods

Each message class also contains a number of other methods that let you check or manipulate the entire message, including:

 

     * bool IsInitialized() const;: checks if all the required fields have been set.

    * string DebugString() const;: returns a human-readable representation of the message, particularly useful for debugging.

    * void CopyFrom(const Person& from);: overwrites the message with the given message's values.

* void Clear();: clears all the elements back to the empty state.

 

Parsing and Serialization

Each protocol buffer class has methods for writing and reading messages of your chosen type using the protocol buffer binary format. These include:

 

* bool SerializeToString(string* output) const;: serializes the message and stores the bytes in the given string. Note that the bytes are binary, not text; we only use the string class as a convenient container.

* bool ParseFromString(const string& data);: parses a message from the given string.

* bool SerializeToOstream(ostream* output) const;: writes the message to the given C++ ostream.

* bool ParseFromIstream(istream* input);: parses a message from the given C++ istream.

Example

The example code is included in the source code package, under the "examples" directory. Study the example!!!

Refer to http://code.google.com/intl/zh-CN/apis/protocolbuffers/docs/cpptutorial.html

 

NOTE: For each field, get_field() and set_field() interfaces will be generated. However, these interfaces only support lowercase field regardless of Field or field is used in protoc file.

 

 

 

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值