应用层协议设计ProtoBuf

最新推荐文章于 2024-04-27 16:49:54 发布

kaka的卡

最新推荐文章于 2024-04-27 16:49:54 发布

阅读量300

点赞数

分类专栏： linux服务器高级框架文章标签： c++

本文链接：https://blog.csdn.net/kakaka666/article/details/129540083

版权

linux服务器高级框架专栏收录该内容

56 篇文章 8 订阅

订阅专栏

一. protocol buffers 是什么？

二. 为什么要发明 protocol buffers

C/C++Linux服务器开发/后台架构师【零声教育】-学习视频教程-腾讯课堂

一. protocol buffers 是什么？

Protocol buffers 是一种语言中立，平台无关，可扩展的序列化数据的格式，可用于通信协议，数据存储等。

Protocol buffers 在序列化数据方面，它是灵活的，高效的。相比于 XML 来说，Protocol buffers 更加小巧，更加快速，更加简单。一旦定义了要处理的数据的数据结构之后，就可以利用 Protocol buffers 的代码生成工具生成相关的代码。甚至可以在无需重新部署程序的情况下更新数据结构。只需使用 Protobuf 对数据结构进行一次描述，即可利用各种不同语言或从各种不同数据流中对你的结构化数据轻松读写。

Protocol buffers 很适合做数据存储或 RPC 数据交换格式。可用于通讯协议、数据存储等领域的语言无关、平台无关、可扩展的序列化结构数据格式。

二. 为什么要发明 protocol buffers

protocol buffers 目前具有了更多的特性：

自动生成的序列化和反序列化代码避免了手动解析的需要。（官方提供自动生成代码工具，各个语言平台的基本都有）
除了用于 RPC（远程过程调用）请求之外，人们开始将 protocol buffers 用作持久存储数据的便捷自描述格式（例如，在Bigtable中）。
服务器的 RPC 接口可以先声明为协议的一部分，然后用 protocol compiler 生成基类，用户可以使用服务器接口的实际实现来覆盖它们。

protocol buffers 现在是 Google 用于数据的通用语言。在撰写本文时，谷歌代码树中定义了 48162 种不同的消息类型，包括 12183 个 .proto 文件。它们既用于 RPC 系统，也用于在各种存储系统中持久存储数据。

protocol buffers 诞生之初是为了解决服务器端新旧协议(高低版本)兼容性问题，名字也很体贴，“协议缓冲区”。只不过后期慢慢发展成用于传输数据。

三. protobuf的编译安装

https://github.com/protocolbuffers/protobuf

⾕歌开源的协议标准+⼯具

安装⼯具->根据编写的proto⽂件产⽣c++代码

1. 解压

tar zxvf protobuf-cpp-3.8.0.tar.gz

2. 编译（make的时间有点⻓）

cd protobuf-3.8.0/

./configure

make

sudo make install

sudo ldconfig

查看版本信息。

protoc --version

四. 语法详解

定义一个消息类型

先来看一个非常简单的例子。假设你想定义一个“搜索请求”的消息格式，每一个请求含有一个查询字符串、你感兴趣的查询结果

所在的页数，以及每一页多少条查询结果。可以采用如下的方式来定义消息类型的.proto文件了

syntax = "proto3";

message SearchRequest {

    string query = 1;

    int32 page_number = 2;

    int32 result_per_page = 3;

}

文件的第一行指定了你正在使用proto3语法：如果你没有指定这个，编译器会使用proto2。这个指定语法行必须是文件的非空非注释的第一个行。

SearchRequest消息格式有3个字段，在消息中承载的数据分别对应于每一个字段。其中每个字段都有一个名字和一种类型

指定字段类型

在上面的例子中，所有字段都是标量类型：两个整型（page_number和result_per_page），一个string类型（query）。当然，你也可以为字段指定其他的合成类型，包括枚举（enumerations）或其他消息类型。

标量数值类型

一个标量消息字段可以含有一个如下的类型——该表格展示了定义于.proto文件中的类型，以及与之对应的、在自动生成的访问类中定义的类型：

五.protobuf的使用

编写proto文件

syntax = "proto3";              // syntax 版本2/3是不一样，默认是proto2

package IM.Login;             //  package 生成对应的C++命名空间 IM::Login::

import "IM.BaseDefine.proto"; // import 引用其他proto文件

option optimize_for = LITE_RUNTIME; // 编译优化



//使用T开头测试

message TInt32{

    int32   int1     = 1;

}



message TString{

    string   str1     = 1;

}

生成相应的.pb.cc文件和.pb.h

#!/bin/sh
# proto文件在哪里
SRC_DIR=./
# .h .cc输出到哪里
DST_DIR=../

#C++
protoc -I=$SRC_DIR --cpp_out=$DST_DIR $SRC_DIR/*.proto

使用.pb.cc文件和.pb.h

#include <iostream>
#include <stdio.h>
#include <unistd.h>
#include <sys/time.h>
#include <sys/wait.h>

#include "IM.BaseDefine.pb.h"
#include "IM.Login.pb.h"

static uint64_t getNowTime()
{
    struct timeval tval;
    uint64_t nowTime;

    gettimeofday(&tval, NULL);

    nowTime = tval.tv_sec * 1000L + tval.tv_usec / 1000L;
    return nowTime;
}

void printHex(uint8_t *data, uint32_t len)
{
    for(uint32_t i = 0; i < len; i++)
    {
        printf("%02x ", data[i]);
    }
    printf("\n\n");
}

void TInt()
{
    std::string strPb;
    uint8_t *szData;
    IM::Login::TInt32 int1;

    uint32_t int1Size = int1.ByteSize();        // 序列化后的大小

    std::cout << "null int1Size = " << int1Size << std::endl;

    int1.set_int1(0x12);
    int1Size = int1.ByteSize();        // 序列化后的大小
    std::cout << "0x12 int1Size = " << int1Size << std::endl;
    strPb.clear();
    strPb.resize(int1Size);
    szData = (uint8_t *)strPb.c_str();
    int1.SerializeToArray(szData, int1Size);   // 拷贝序列化后的数据
    printHex(szData, int1Size);
    
    int1.set_int1(-11);
    int1Size = int1.ByteSize();        // 序列化后的大小
    std::cout << "-11 int1Size = " << int1Size << std::endl;
    strPb.clear();
    strPb.resize(int1Size);
    szData = (uint8_t *)strPb.c_str();
    int1.SerializeToArray(szData, int1Size);   // 拷贝序列化后的数据
    printHex(szData, int1Size);
    

    int1.set_int1(0x7f);
    int1Size = int1.ByteSize();        // 序列化后的大小
    std::cout << "0xff int1Size = " << int1Size << std::endl;
    strPb.clear();
    strPb.resize(int1Size);
    szData = (uint8_t *)strPb.c_str();
    int1.SerializeToArray(szData, int1Size);   // 拷贝序列化后的数据
    printHex(szData, int1Size);
    

    int1.set_int1(0xff);
    int1Size = int1.ByteSize();        // 序列化后的大小
    std::cout << "0xff int1Size = " << int1Size << std::endl;
    strPb.clear();
    strPb.resize(int1Size);
    szData = (uint8_t *)strPb.c_str();
    int1.SerializeToArray(szData, int1Size);   // 拷贝序列化后的数据
    printHex(szData, int1Size);

    int1.set_int1(0x1234);
    int1Size = int1.ByteSize();        // 序列化后的大小
    std::cout << "0x1234 int1Size = " << int1Size << std::endl;
    strPb.clear();
    strPb.resize(int1Size);
    szData = (uint8_t *)strPb.c_str();
    int1.SerializeToArray(szData, int1Size);   // 拷贝序列化后的数据
    printHex(szData, int1Size);

    int1.set_int1(0x123456);
    int1Size = int1.ByteSize();        // 序列化后的大小
    std::cout << "0x123456 int1Size = " << int1Size << std::endl;
    strPb.clear();
    strPb.resize(int1Size);
    szData = (uint8_t *)strPb.c_str();
    int1.SerializeToArray(szData, int1Size);   // 拷贝序列化后的数据
    printHex(szData, int1Size);
}

void TString(void)
{
    std::string strPb;
    uint8_t *szData;
    IM::Login::TString str1;

    uint32_t str1Size = str1.ByteSize();        // 序列化后的大小

    std::cout << "null str1Size = " << str1Size << std::endl;

    str1.set_str1("1");
    str1Size = str1.ByteSize();                 // 序列化后的大小
    std::cout << "1 str1Size = " << str1Size << std::endl;
    strPb.clear();
    strPb.resize(str1Size);
    szData = (uint8_t *)strPb.c_str();
    str1.SerializeToArray(szData, str1Size);   // 拷贝序列化后的数据
    printHex(szData, str1Size);
    
    str1.set_str1("1234");
    str1Size = str1.ByteSize();                 // 序列化后的大小
    std::cout << "1234 str1Size = " << str1Size << std::endl;
    strPb.clear();
    strPb.resize(str1Size);
    szData = (uint8_t *)strPb.c_str();
    str1.SerializeToArray(szData, str1Size);   // 拷贝序列化后的数据
    printHex(szData, str1Size);

    str1.set_str1("老师");
    str1Size = str1.ByteSize();                 // 序列化后的大小
    std::cout << "老师 str1Size = " << str1Size << std::endl;
    strPb.clear();
    strPb.resize(str1Size);
    szData = (uint8_t *)strPb.c_str();
    str1.SerializeToArray(szData, str1Size);   // 拷贝序列化后的数据
    printHex(szData, str1Size);
}

int main(void)
{
    TInt();
    TString();
    return 0;
}

输出结果

null int1Size = 0

0x12 int1Size = 2

08 12



-11 int1Size = 11

08 f5 ff ff ff ff ff ff ff ff 01



0xff int1Size = 2

08 7f



0xff int1Size = 3

08 ff 01



0x1234 int1Size = 3

08 b4 24



0x123456 int1Size = 4

08 d6 e8 48



null str1Size = 0

1 str1Size = 3

0a 01 31



1234 str1Size = 6

0a 04 31 32 33 34



老师 str1Size = 8

0a 06 e8 80 81 e5 b8 88

总结

Protocol Buffer 利用 varint 原理压缩数据以后，二进制数据非常紧凑，option 也算是压缩体积的一个举措。所以 pb 体积更小，如果选用它作为网络数据传输，势必相同数据，消耗的网络流量更少。但是并没有压缩到极限，float、double 浮点型都没有压缩。

Protocol Buffer 比 JSON 和 XML 少了 {、}、: 这些符号，体积也减少一些。再加上 varint 压缩，gzip 压缩以后体积更小！

Protocol Buffer 是 Tag - Value (Tag - Length - Value)的编码方式的实现，减少了分隔符的使用，数据存储更加紧凑。

Protocol Buffer 另外一个核心价值在于提供了一套工具，一个编译工具，自动化生成 get/set 代码。简化了多语言交互的复杂度，使得编码解码工作有了生产力。

Protocol Buffer 不是自我描述的，离开了数据描述 .proto 文件，就无法理解二进制数据流。这点即是优点，使数据具有一定的“加密性”，也是缺点，数据可读性极差。所以 Protocol Buffer 非常适合内部服务之间 RPC 调用和传递数据。

Protocol Buffer 具有向后兼容的特性，更新数据结构以后，老版本依旧可以兼容，这也是 Protocol Buffer 诞生之初被寄予解决的问题。因为编译器对不识别的新增字段会跳过不处理。

kaka的卡

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
应用层协议设计ProtoBuf

Protocol buffers 在序列化数据方面，它是灵活的，高效的。Protocol Buffer 具有向后兼容的特性，更新数据结构以后，老版本依旧可以兼容，这也是 Protocol Buffer 诞生之初被寄予解决的问题。Protocol Buffer 是 Tag - Value (Tag - Length - Value)的编码方式的实现，减少了分隔符的使用，数据存储更加紧凑。Protocol buffers 是一种语言中立，平台无关，可扩展的序列化数据的格式，可用于通信协议，数据存储等。
复制链接

扫一扫