https://developers.google.com/protocol-buffers/docs/cpptutorial
0. 简介
内容来自引文。
用于将数据序列化、反序列化。比如,想将一个struct存储到文件中(序列化),以后还会读取回来(反序列化)。通常的做法主要有:
最原始的做法是将内存中的数据以二进制形式原封不动的存到文件中,然后在需要的时候直接读回到内存就可以了。但是这么做缺点是:要求序列化、反序列化的语言使用相同的变量内存布局,大小端等。在不同语言之间显然不是很方便了。
编码成字符串。将数据表示成字符串,提供encode和parse方法。
使用xml。xml有很好的可读性,语言支持广泛,是与其他程序共享数据的好方法。缺点是浪费空间、编码/解码效率不高
Google的protobuf可以看做以一个协议,不同的语言都使用相同的协议来序列化、反序列化数据。它的优点是:节省空间、高效。缺点是:听说是可读性差?没体会过,不要相信我。>_<
使用时,只需要提供一个描述文件.proto
,然后使用工具protoc
就可以自动生成序列化/反序列化的代码。
安装:
$ ./autogen.sh
$ ./configure
$ make
$ make install
安装对python的支持:
$ cd ./python
$ python setup.py build
$ python setup.py test
$ sudo python setup.py install
1. 编写proto文件
package tutorial;
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
enum PhoneType {
MOBILE = 0;
HOME = 1;
WORK = 2;
}
message PhoneNumber {
required string number = 1;
optional PhoneType type = 2 [default = HOME];
}
repeated PhoneNumber phone = 4;
}
message AddressBook {
repeated Person person = 1;
}
2.编译proto文件
2.1 生成C++的类
$ protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/addressbook.proto
编译成功后会生成两个文件:
- addressbook.pb.h, the header which declares your generated classes.
- addressbook.pb.cc, which contains the implementation of your classes.
2.2 生成Python的类
$ protoc -I=$SRC_DIR --python_out=$DST_DIR $SRC_DIR/addressbook.proto
3.使用
3.1 C++写入数据
write.cc
#include <iostream>
#include <fstream>
#include <string>
#include "addressbook.pb.h"
using namespace std;
// This function fills in a Person message based on user input.
void PromptForAddress(tutorial::Person* person) {
cout << "Enter person ID number: ";
int id;
cin >> id;
person->set_id(id);
cin.ignore(256, '\n');
cout << "Enter name: ";
getline(cin, *person->mutable_name());
cout << "Enter email address (blank for none): ";
string email;
getline(cin, email);
if (!email.empty()) {
person->set_email(email);
}
while (true) {
cout << "Enter a phone number (or leave blank to finish): ";
string number;
getline(cin, number);
if (number.empty()) {
break;
}
tutorial::Person::PhoneNumber* phone_number = person->add_phone();
phone_number->set_number(number);
cout << "Is this a mobile, home, or work phone? ";
string type;
getline(cin, type);
if (type == "mobile") {
phone_number->set_type(tutorial::Person::MOBILE);
} else if (type == "home") {
phone_number->set_type(tutorial::Person::HOME);
} else if (type == "work") {
phone_number->set_type(tutorial::Person::WORK);
} else {
cout << "Unknown phone type. Using default." << endl;
}
}
}
// Main function: Reads the entire address book from a file,
// adds one person based on user input, then writes it back out to the same
// file.
int main(int argc, char* argv[]) {
// Verify that the version of the library that we linked against is
// compatible with the version of the headers we compiled against.
GOOGLE_PROTOBUF_VERIFY_VERSION;
if (argc != 2) {
cerr << "Usage: " << argv[0] << " ADDRESS_BOOK_FILE" << endl;
return -1;
}
tutorial::AddressBook address_book;
{
// Read the existing address book.
fstream input(argv[1], ios::in | ios::binary);
if (!input) {
cout << argv[1] << ": File not found. Creating a new file." << endl;
} else if (!address_book.ParseFromIstream(&input)) {
cerr << "Failed to parse address book." << endl;
return -1;
}
}
// Add an address.
PromptForAddress(address_book.add_person());
{
// Write the new address book back to disk.
fstream output(argv[1], ios::out | ios::trunc | ios::binary);
if (!address_book.SerializeToOstream(&output)) {
cerr << "Failed to write address book." << endl;
return -1;
}
}
// Optional: Delete all global objects allocated by libprotobuf.
google::protobuf::ShutdownProtobufLibrary();
return 0;
}
编译&运行:
$ pkg-config --cflags --libs protobuf
-pthread -I/usr/local/include -pthread -L/usr/local/lib -lprotobuf -lpthread
$ g++ write.cc addressbook.pb.cc -I/usr/local/include -pthread -L/usr/local/lib -lprotobuf -lpthread
$ ./write book
3.2 Python读取数据
read.py
#! /usr/bin/python
import addressbook_pb2
import sys
# Iterates though all people in the AddressBook and prints info about them.
def ListPeople(address_book):
for person in address_book.person:
print "Person ID:", person.id
print " Name:", person.name
if person.HasField('email'):
print " E-mail address:", person.email
for phone_number in person.phone:
if phone_number.type == addressbook_pb2.Person.MOBILE:
print " Mobile phone #: ",
elif phone_number.type == addressbook_pb2.Person.HOME:
print " Home phone #: ",
elif phone_number.type == addressbook_pb2.Person.WORK:
print " Work phone #: ",
print phone_number.number
# Main procedure: Reads the entire address book from a file and prints all
# the information inside.
if len(sys.argv) != 2:
print "Usage:", sys.argv[0], "ADDRESS_BOOK_FILE"
sys.exit(-1)
address_book = addressbook_pb2.AddressBook()
# Read the existing address book.
f = open(sys.argv[1], "rb")
address_book.ParseFromString(f.read())
f.close()
ListPeople(address_book)
$ ./read.py book
...
4. 使用感受
“通过加入一个中间层可以解决任何问题”。原话记不清楚了,但是基本是这个意思。通过引入一个中间层解决问题的例子有很多,比如TCP/IP协议。个人认为Google的Protobuf其实也是引入了一个中间层,这个中间层就是上面提到的描述文件。使用它可以得到其他编程语言的操作代码,而不是让程序员重复的coding。
5. 源码分析
protobuf不仅能够对消息进行序列化和反序列化(而且是不同语言之间),而且可以生产RPC的框架,官方介绍在这里
https://developers.google.com/protocol-buffers/docs/proto#services
网友也进行了很多总结,见这里:
http://www.codedump.info/?p=169
http://codemacro.com/2014/08/31/protobuf-rpc/
当然google也提供了一种基于protobuf的RPC框架,即gRPC。值得一提的是,百度开源的brpc也使用了protobuf,目前只开源了C++版本,文档写的很全,值得学习。
5.1 类图
注释:此图来自muduo的那本书,下面的内容也从muduo中的总结而来,这里记录下自己的理解过程。
5.2 MessageLite
MessageLite表示轻量级(light weight)的protocol message,它定义了一系列的接口(interface,即纯虚函数和虚函数),如
- New(): MessageLite*
- GetTypeName(): string
- ByteSize(): int
- ParseFrom*(): bool
- SerializeTo*(): bool
另外附上源码中对MessageLite的注释:
Interface to light weight protocol messages.
This interface is implemented by all protocol message objects. Non-lite
messages additionally implement the Message interface, which is a
subclass of MessageLite. Use MessageLite instead when you only need
the subset of features which it supports -- namely, nothing that uses
descriptors or reflection. You can instruct the protocol compiler
to generate classes which implement only MessageLite, not the full
Message interface, by adding the following line to the .proto file:
option optimize_for = LITE_RUNTIME;
This is particularly useful on resource-constrained systems where
the full protocol buffers runtime library is too big.
Note that on non-constrained systems (e.g. servers) when you need
to link in lots of protocol definitions, a better way to reduce
total code footprint is to use optimize_for = CODE_SIZE. This
will make the generated code smaller while still supporting all the
same features (at the expense of speed). optimize_for = LITE_RUNTIME
is best when you only have a small number of message types linked
into your binary, in which case the size of the protocol buffers
runtime itself is the biggest problem.
当你不需要下面提到的descriptor和reflection时,可以在.proto
文件中加入option optimize_for = LITE_RUNTIME
来告诉protobuf的编译器,生成类的时候使用MessageLite而不是Message类。
5.3 Message
Message是MessageLite的派生类,它额外增加了对descriptor和reflection的支持。那么descriptor和reflection是什么意思呢?这里先通过源码中给出的例子来理解吧。源码位于: /src/google/protobuf/message.h
对于一个如下定义的消息:
message Foo {
optional string text = 1;
repeated int32 numbers = 2;
}
protobuf 编译器生成类Foo,通过它你可以进行序列化和反序列化:
string data; // 存放序列化的结果
{
// 创建一个消息foo,并进行序列化,结果保存到data中
Foo foo;
foo.set_text("Hello World!");
foo.add_numbers(1);
foo.add_numbers(5);
foo.add_numbers(42);
foo.SerializeToString(&data);
}
{
// 对data进行反序列化
Foo foo;
foo.ParseFromString(data);
assert(foo.text() == "Hello World!");
assert(foo.numbers_size() == 3);
assert(foo.numbers(0) == 1);
assert(foo.numbers(1) == 5);
assert(foo.numbers(2) == 42);
}
上面的方法应该已经很熟悉了,除此之外验证内容还有另外一种方法,就是使用descriptor和reflection:
{
// Same as the last block, but do it dynamically via the Message
// reflection interface.
Message* foo = new Foo;
const Descriptor* descriptor = foo->GetDescriptor();
// Get the descriptors for the fields we're interested in and verify
// their types.
const FieldDescriptor* text_field = descriptor->FindFieldByName("text");
assert(text_field != NULL);
assert(text_field->type() == FieldDescriptor::TYPE_STRING);
assert(text_field->label() == FieldDescriptor::LABEL_OPTIONAL);
const FieldDescriptor* numbers_field = descriptor->
FindFieldByName("numbers");
assert(numbers_field != NULL);
assert(numbers_field->type() == FieldDescriptor::TYPE_INT32);
assert(numbers_field->label() == FieldDescriptor::LABEL_REPEATED);
// Parse the message.
foo->ParseFromString(data);
// Use the reflection interface to examine the contents.
const Reflection* reflection = foo->GetReflection();
assert(reflection->GetString(*foo, text_field) == "Hello World!");
assert(reflection->FieldSize(*foo, numbers_field) == 3);
assert(reflection->GetRepeatedInt32(*foo, numbers_field, 0) == 1);
assert(reflection->GetRepeatedInt32(*foo, numbers_field, 1) == 5);
assert(reflection->GetRepeatedInt32(*foo, numbers_field, 2) == 42);
delete foo;
}
通过消息的Descriptor可以通过字段名称找到字段的描述类FieldDescriptor,进而对字段type和label进行验证。
然后通过消息的Reflection可以根据FieldDescriptor得到字段的内容。
看到这里想必对descriptor和reflection有了初步的认识,Descriptor是对一个Message类的描述类:
Describes a type of protocol message, or a particular group within a
message. To obtain the Descriptor for a given message object, call
Message::GetDescriptor(). Generated message classes also have a
static method called descriptor() which returns the type's descriptor.
Use DescriptorPool to construct your own descriptors.
Reflection: 通过它可以动态的访问和修改消息的某个字段:
This interface contains methods that can be used to dynamically access
and modify the fields of a protocol message. Their semantics are
similar to the accessors the protocol compiler generates.
To get the Reflection for a given Message, call Message::GetReflection().
...
muduo对protobuf的reflection进行了总结:
descriptor和reflection使得protobuf的message具备反射功能,可以根据type name创建具体类型的Message对象。
Protobuf Message class 采用了Prototype pattern,Message class定义了New() 虚函数,用以返回本对象的一份新
实体,类型与本对象的真实类型相同。也就是说,拿到Message*指针,不用知道它的具体类型,就能创建和其类型一样的具体Message type的对象。
每个具体的Message type都有一个default instance,可以通过ConcreteMessage::default_instance()获得,也可以通过MessageFactory::GetPrototype(const Descriptor*)来获得。
通过protobuf的reflection功能,可以方便的在自己的RPC框架中让server端根据client发送过来的type name构建相应的消息:
- server使用typeName通过
DescriptorPool::generated_pool()->FindMessageTypeByName(typeName)
得到消息的descriptor - 接着,server通过
MessageFactory::generated_factory()->GetPrototype(descriptor)
得到具体消息的default instance - 最后,使用default instance的New() 创建出具体对象(prototype pattern)
下面的代码源自muduo/example/protobuf/codec.h
, 展示了server端通过client发送过来的typeName创建具体的消息的过程:
google::protobuf::Message* ProtobufCodec::createMessage(const std::string& typeName)
{
google::protobuf::Message* message = NULL;
const google::protobuf::Descriptor* descriptor =
google::protobuf::DescriptorPool::generated_pool()->FindMessageTypeByName(typeName);
if (descriptor)
{
const google::protobuf::Message* prototype =
google::protobuf::MessageFactory::generated_factory()->GetPrototype(descriptor);
if (prototype)
{
message = prototype->New();
}
}
return message;
}
RPC通常包含:Message Passing 和 Message Dispatching。上面仅仅是完成了Message Passing,那么Message Dispatching应该怎么做呢?muduo给出了一种方法:使用map将消息和消息的处理函数对应起来:
typedef boost::shared_ptr<google::protobuf::Message> MessagePtr;
class ProtobufDispatcherLite : boost::noncopyable
{
public:
typedef boost::function<void (const muduo::net::TcpConnectionPtr&,
const MessagePtr&,
muduo::Timestamp)> ProtobufMessageCallback;
// ProtobufDispatcher()
// : defaultCallback_(discardProtobufMessage)
// {
// }
explicit ProtobufDispatcherLite(const ProtobufMessageCallback& defaultCb)
: defaultCallback_(defaultCb)
{
}
void onProtobufMessage(const muduo::net::TcpConnectionPtr& conn,
const MessagePtr& message,
muduo::Timestamp receiveTime) const
{
CallbackMap::const_iterator it = callbacks_.find(message->GetDescriptor());
if (it != callbacks_.end())
{
it->second(conn, message, receiveTime);
}
else
{
defaultCallback_(conn, message, receiveTime);
}
}
void registerMessageCallback(const google::protobuf::Descriptor* desc,
const ProtobufMessageCallback& callback)
{
callbacks_[desc] = callback;
}
private:
// static void discardProtobufMessage(const muduo::net::TcpConnectionPtr&,
// const MessagePtr&,
// muduo::Timestamp);
typedef std::map<const google::protobuf::Descriptor*, ProtobufMessageCallback> CallbackMap;
CallbackMap callbacks_;
ProtobufMessageCallback defaultCallback_;
};
}
这里的ProtobufDispatcherLite即为消息分发(Message Dispatching)类,server在初始化时要将消息的Descriptor和Callback添加到map中; 当server收到client发送过来的消息时(包含typeName和protobuf的序列化结果data),server根据typeName调用上面的createMessage(typeName)得到消息的具体对象m,然后m->ParseFromArray(data)进行反序列化,然后通过ProtobufDispatcherLite的onProtobufMessage对消息进行分发。
这样RPC的开发者不必再手动地根据消息类型(多个if判断)进行消息分发了。添加一个新的消息类型只需要:
- 调用ProtobufDispatcherLite的registerMessageCallback,注册某个消息的处理函数
- 实现处理函数
It is really cool !
6. protobuf service
http://codemacro.com/2014/08/31/protobuf-rpc/
http://www.codedump.info/?p=169
service MyService {
rpc Echo(EchoReqMsg) returns(EchoRespMsg)
}
protobuf虽然没有提供rpc的实现,但是可以生成rpc的定义。
第一篇引文给出了使用protobuf的option机制来对service和method进行标识(16比特的数字),那么一个server&method的唯一标识为32bit的int,再使用map建立索引和具体service的映射就OK了。而muduo的protobuf rpc直接将service和method的name(string)作为rpc message的一部分,server收到rpc message后,使用service的name和method的name结合protobuf的reflection能力自动找到具体的service实现。muduo的具体实现见:
https://github.com/chenshuo/muduo/tree/master/examples/protobuf/rpc
muduo的server端设置了service name 到 service* 的映射: std::map<std::string, google::protobuf::Service*>
收到service name后,可以从这个map中找到对应的service*,从而得到该service的Descriptor(还是熟悉的味道),根据method name 找到method 的descriptor,使用service通过method descriptor得到request prototype和request prototype,最后调用service->CallMethod(method, NULL /controller/, request, response, done )。CallMethod会调用到具体的处理函数。
muduo实现的rpc更详细的分析见这里: