Wangle源码分析：编解码Handler

最新推荐文章于 2020-08-17 15:07:47 发布

weixin_34344403

最新推荐文章于 2020-08-17 15:07:47 发布

阅读量409

点赞数

文章标签： python 网络

原文链接：https://my.oschina.net/fileoptions/blog/882041

版权

2019独角兽企业重金招聘Python工程师标准>>>

前言

编解码是协议相关的，如果没有编解码Handler，那么在处理网络的粘包、拆包时会变得很复杂。除了http之类的公有协议之外，大多数的网络协议都是私有的，因此编解码也是多样的。但是最常用的编解码规则无外乎这几种：基于固定长度的、变长且长度字段位于一帧开头位置的、变长且长度字段不固定的、基于分隔符的、基于文本字符串的等等，好在Wangle框架本身提供了这些常用的编解码Handler，下面就依次看看它们的实现原理吧。

FixedLengthFrameDecoder

这是固定长度的编解码Handler，现在假设IOBufs中收到了如下4个数据包：

  +---+----+------+----+
  | A | BC | DEFG | HI |
  +---+----+------+----+

假设协议规定一个完整帧的长度为3，那么FixedLengthFrameDecoder会将上面的数据解码为一下格式：

  +-----+-----+-----+
  | ABC | DEF | GHI |
  +-----+-----+-----+

FixedLengthFrameDecoder的实现很简单：

class FixedLengthFrameDecoder : public ByteToByteDecoder {
 public:
  explicit FixedLengthFrameDecoder(size_t length) : length_(length) {}

  bool decode(Context*,
              folly::IOBufQueue& q,
              std::unique_ptr<folly::IOBuf>& result,
              size_t& needed) override {
    // 判断已经收到到的数据包的长度
    if (q.chainLength() < length_) {
      // 如果长度不够，更新needed字段
      needed = length_ - q.chainLength();
      // 本次解码失败
      return false;
    }
    // 否则，按照协议包的长度切割数据帧
    result = q.split(length_);
    // 解码成功
    return true;
  }

 private:
  // 保存协议包的长度，通过构造器注入
  size_t length_;
};

ByteToByteDecoder是一个通用的解码Handler接口，本质是一个InboundHandler<folly::IOBufQueue&, M>类型的Handler，特殊之处在于，它内部定义了一个纯虚接口decode，有具体的解码策略实现。ByteToByteDecoder的定义如下：

template <typename M>
class ByteToMessageDecoder : public InboundHandler<folly::IOBufQueue&, M> {
 public:
  typedef typename InboundHandler<folly::IOBufQueue&, M>::Context Context;

  /**
   * Decode bytes from buf into result.
   *
   * @return bool - Return true if decoding is successful, false if buf
   *                has insufficient bytes.
   */
  virtual bool decode(Context* ctx, folly::IOBufQueue& buf, M& result, size_t&) = 0;

  void read(Context* ctx, folly::IOBufQueue& q) override {
    bool success = true;
    do {
      // Rout类型
      M result;
      // 解码还需要的字节数
      size_t needed = 0;
      // 解码
      success = decode(ctx, q, result, needed);
      if (success) {
        // 只有解码成功，事件才会继续往下传播
        ctx->fireRead(std::move(result));
      }
    } while (success);
  }
};

LengthFieldPrepender

LengthFieldPrepender和ByteToByteDecoder的区别是，LengthFieldPrepender的长度是不固定的，具体的长度由协议帧的长度字段指定（长度字段处于协议的开始部分）。假设现在要写一个内容为“HELLO, WORLD”的报文，采用协议字段长度为2的方式进行编码，则编码后的报文为：

  +--------+----------------+
  + 0x000C | "HELLO, WORLD" |
  +--------+----------------+

如果开启了lengthIncludesLengthFieldLength标识，也就是协议总长度包含了长度字段本身占用的字节数，那么编码为：

  +--------+----------------+
  + 0x000E | "HELLO, WORLD" |
  +--------+----------------+

LengthFieldPrepender的实现也是相当简单的：

class LengthFieldPrepender : public OutboundBytesToBytesHandler {
 public:
  explicit LengthFieldPrepender(int lengthFieldLength = 4,
                                int lengthAdjustment = 0,
                                bool lengthIncludesLengthField = false,
                                bool networkByteOrder = true);

  folly::Future<folly::Unit> write(Context* ctx,std::unique_ptr<folly::IOBuf> buf){
  // 总长度为本次要写的数据包的长度加上调整值
  int length = lengthAdjustment_ + buf->computeChainDataLength();
  // 如果协议长度包含长度字段本身
  if (lengthIncludesLengthField_) {
    // 那么总长度还需要加上长度字段占用的字节数
    length += lengthFieldLength_;
  }

  if (length < 0) {
    throw std::runtime_error("Length field < 0");
  }
  // 发送缓冲区
  auto len = IOBuf::create(lengthFieldLength_);
  // 多开辟lengthFieldLength_
  len->append(lengthFieldLength_);
  // 接收缓冲区的游标（处于最开始位置），便于按字节操作
  folly::io::RWPrivateCursor c(len.get());

  // 散转长度字段占用的字节数，最多八个字节
  switch (lengthFieldLength_) {
    case 1: {
      if (length >= 256) {
        throw std::runtime_error("length does not fit byte");
      }
      // 如果按照网络字节序
      if (networkByteOrder_) {
        // 大端形式写
        c.writeBE((uint8_t)length);
      } else {
        // 小端形式写
        c.writeLE((uint8_t)length);
      }
      break;
    }
    case 2: {
      if (length >= 65536) {
        throw std::runtime_error("length does not fit byte");
      }
      if (networkByteOrder_) {
        c.writeBE((uint16_t)length);
      } else {
        c.writeLE((uint16_t)length);
      }
      break;
    }
    case 4: {
      if (networkByteOrder_) {
        c.writeBE((uint32_t)length);
      } else {
        c.writeLE((uint32_t)length);
      }
      break;
    }
    case 8: {
      if (networkByteOrder_) {
        c.writeBE((uint64_t)length);
      } else {
        c.writeLE((uint64_t)length);
      }
      break;
    }
    default: {
      throw std::runtime_error("Invalid lengthFieldLength");
    }
  }

  // 把buf合并到len中
  len->prependChain(std::move(buf));
  return ctx->fireWrite(std::move(len));
}


 private:

  int lengthFieldLength_;// 长度字段对应的长度（字节数）
  int lengthAdjustment_;// 长度调整值
  bool lengthIncludesLengthField_;// 帧长度是否包含长度字段本身
  bool networkByteOrder_; // 网络字节序
};

其中，父类OutboundBytesToBytesHandler只是一个简单的别名：

typedef OutboundHandler<std::unique_ptr<folly::IOBuf>> OutboundBytesToBytesHandler;

LengthFieldBasedFrameDecoder

LengthFieldBasedFrameDecoder是一个非常强大、通用的解码Handler，它可以在IOBufs中动态的切割任何基于长度的协议报文，特别适合那种协议头部包含长度（无论是body长度还是总长度）字段的协议。

现在假设协议固定长度字段偏移为0，占用2个字节，并且不strip任何字节。假设长度字段的值为12（0x0c）,代表的报文内容为“HELLO, WORLD”，LengthFieldBasedFrameDecoder在默认情况下的配置是：

  lengthFieldOffset   = 0
  lengthFieldLength   = 2
  lengthAdjustment    = 0
  initialBytesToStrip = 0 (= do not strip header)

那么解码前和解码后的效果如下：

 * BEFORE DECODE (14 bytes)         AFTER DECODE (14 bytes)
 * +--------+----------------+      +--------+----------------+
 * | Length | Actual Content |----->| Length | Actual Content |
 * | 0x000C | "HELLO, WORLD" |      | 0x000C | "HELLO, WORLD" |
 * +--------+----------------+      +--------+----------------+

而如果配置变为strip掉两个字节：

 lengthFieldOffset   = 0
 lengthFieldLength   = 2
 lengthAdjustment    = 0
 initialBytesToStrip = 2 (= the length of the Length field)

那么解码前后的效果如下（即把长度字段给strip掉了）：

 * BEFORE DECODE (14 bytes)         AFTER DECODE (12 bytes)
 * +--------+----------------+      +----------------+
 * | Length | Actual Content |----->| Actual Content |
 * | 0x000C | "HELLO, WORLD" |      | "HELLO, WORLD" |
 * +--------+----------------+      +----------------+

大多数情况下，长度字段地表的长度值只包含了协议的body部分，就比如前面的两个例子。但是在有些情况下，长度字段代码的是整个协议报文的长度（header+body），这种情况下，可以指定一个长度调整值lengthAdjustment，因此这里的header只包含长度字段，其长度为2，因此这里将lengthAdjustment设置为-2。配置如下：

  lengthFieldOffset   =  0
  lengthFieldLength   =  2
  lengthAdjustment    = -2 (= the length of the Length field)
  initialBytesToStrip =  0

效果为：

 * BEFORE DECODE (14 bytes)         AFTER DECODE (14 bytes)
 * +--------+----------------+      +--------+----------------+
 * | Length | Actual Content |----->| Length | Actual Content |
 * | 0x000E | "HELLO, WORLD" |      | 0x000E | "HELLO, WORLD" |
 * +--------+----------------+      +--------+----------------+

现在假设协议的开始两个字节不再是长度字段，而是一个Header值，之后才是长度字段，并且长度字段变为3个字节，偏移为2，则配置为：

  lengthFieldOffset   = 2 (= the length of Header 1)
  lengthFieldLength   = 3
  lengthAdjustment    = 0
  initialBytesToStrip = 0

则解码效果为：

 * BEFORE DECODE (17 bytes)                      AFTER DECODE (17 bytes)
 * +----------+----------+----------------+      +----------+----------+----------------+
 * | Header 1 |  Length  | Actual Content |----->| Header 1 |  Length  | Actual Content |
 * |  0xCAFE  | 0x00000C | "HELLO, WORLD" |      |  0xCAFE  | 0x00000C | "HELLO, WORLD" |
 * +----------+----------+----------------+      +----------+----------+----------------+

现在假设上例中的Header值位于长度字段和body之前，那么此时需要设置lengthAdjustment为2，一遍把Header的两个自己包含进来，配置如下：

  lengthFieldOffset   = 0
  lengthFieldLength   = 3
  lengthAdjustment    = 2 (= the length of Header 1)
  initialBytesToStrip = 0

效果为：

 * BEFORE DECODE (17 bytes)                      AFTER DECODE (17 bytes)
 * +----------+----------+----------------+      +----------+----------+----------------+
 * |  Length  | Header 1 | Actual Content |----->|  Length  | Header 1 | Actual Content |
 * | 0x00000C |  0xCAFE  | "HELLO, WORLD" |      | 0x00000C |  0xCAFE  | "HELLO, WORLD" |
 * +----------+----------+----------------+      +----------+----------+----------------+

下面来了综合性的例子，假设在长度字段之前有一个字节的HDR1字段，同时长度字段之后还有一个字节的HDR2字段，长度字段本身占用两个字节，那么此时的配置为：

  lengthFieldOffset   = 1 (= the length of HDR1)
  lengthFieldLength   = 2
  lengthAdjustment    = 1 (= the length of HDR2)
  initialBytesToStrip = 3 (= the length of HDR1 + LEN)

解码效果为：

 * BEFORE DECODE (16 bytes)                       AFTER DECODE (13 bytes)
 * +------+--------+------+----------------+      +------+----------------+
 * | HDR1 | Length | HDR2 | Actual Content |----->| HDR2 | Actual Content |
 * | 0xCA | 0x000C | 0xFE | "HELLO, WORLD" |      | 0xFE | "HELLO, WORLD" |
 * +------+--------+------+----------------+      +------+----------------+

最后，稍微修改一下上面的例子，现在长度字段代表了整个帧的长度，那么配置变为;

  lengthFieldOffset   =  1
  lengthFieldLength   =  2
  lengthAdjustment    = -3 (= the length of HDR1 + LEN, negative)
  initialBytesToStrip =  3

解码效果为：

 * BEFORE DECODE (16 bytes)                       AFTER DECODE (13 bytes)
 * +------+--------+------+----------------+      +------+----------------+
 * | HDR1 | Length | HDR2 | Actual Content |----->| HDR2 | Actual Content |
 * | 0xCA | 0x0010 | 0xFE | "HELLO, WORLD" |      | 0xFE | "HELLO, WORLD" |
 * +------+--------+------+----------------+      +------+----------------+

好了，举了这么多的例子，大家应该对LengthFieldBasedFrameDecoder的用法足够了解了，下面来看看它的实现吧：

class LengthFieldBasedFrameDecoder : public ByteToByteDecoder {
 public:
  explicit LengthFieldBasedFrameDecoder(uint32_t lengthFieldLength = 4,
                                        uint32_t maxFrameLength = UINT_MAX,
                                        uint32_t lengthFieldOffset = 0,
                                        int32_t lengthAdjustment = 0,
                                        uint32_t initialBytesToStrip = 4,
                                        bool networkByteOrder = true);

  bool decode(Context* ctx,folly::IOBufQueue& buf,std::unique_ptr<folly::IOBuf>& result,size_t&) override {
  // 如果读到的数据包长度还不足以读取长度字段，那么直接返回解码失败
  if (buf.chainLength() < lengthFieldEndOffset_) {
    return false;
  }

  // 解码获取未经调整的帧长度
  uint64_t frameLength = getUnadjustedFrameLength(
    buf, lengthFieldOffset_, lengthFieldLength_, networkByteOrder_);

  // 对帧长度进行调整
  frameLength += lengthAdjustment_ + lengthFieldEndOffset_;

  // 如果帧长度小于长度字段结束位置的偏移
  if (frameLength < lengthFieldEndOffset_) {
    buf.trimStart(lengthFieldEndOffset_);
    ctx->fireReadException(folly::make_exception_wrapper<std::runtime_error>(
                             "Frame too small"));
    return false;
  }

  // 如果帧长度大于最大长度
  if (frameLength > maxFrameLength_) {
    buf.trimStart(frameLength);
    ctx->fireReadException(folly::make_exception_wrapper<std::runtime_error>(
                             "Frame larger than " +
                             folly::to<std::string>(maxFrameLength_)));
    return false;
  }

  // 如果已经读到的数据包长度小于帧长度
  if (buf.chainLength() < frameLength) {
    // 返回解码失败
    return false;
  }

  // 如果要初始化丢弃的字节数比帧长度还要大
  if (initialBytesToStrip_ > frameLength) {
    buf.trimStart(frameLength);
    ctx->fireReadException(folly::make_exception_wrapper<std::runtime_error>(
                             "InitialBytesToSkip larger than frame"));
    return false;
  }
  // 丢弃字节数
  buf.trimStart(initialBytesToStrip_);
  // 计算最终的实际长度
  int actualFrameLength = frameLength - initialBytesToStrip_;
  result = buf.split(actualFrameLength);
  return true;
}

 private:

  uint64_t getUnadjustedFrameLength(
    folly::IOBufQueue& buf, int offset, int length, bool networkByteOrder);

    
  uint32_t lengthFieldLength_;// 长度字段本身占用字节数
  uint32_t maxFrameLength_;// 帧最大长度
  uint32_t lengthFieldOffset_;// 长度字段在帧中的偏移字节数
  int32_t lengthAdjustment_;// 长度调整值
  uint32_t initialBytesToStrip_;// 初始化丢弃的字节数
  bool networkByteOrder_;// 是否是网络字节序

  uint32_t lengthFieldEndOffset_;// 长度字段结束位置在帧中的偏移位置 lengthFieldOffset + lengthFieldLength
};

关于getUnadjustedFrameLength的实现非常简单，此处就不再赘述了。

LineBasedFrameDecoder

LineBasedFrameDecoder从字面意思就可以知道，它是用来进行行解码的，也就是在遇到 "\n" 或 "\r\n"时就会被判定为一行。由于原理非常简单，自出就不再贴代码啦。

StringCodec

StringCodec主要完成IOBufs与std::string之间的编解码，直接来看它的实现吧：

class StringCodec : public Handler<std::unique_ptr<folly::IOBuf>, std::string,
                                   std::string, std::unique_ptr<folly::IOBuf>> {
 public:
  typedef typename Handler<
   std::unique_ptr<folly::IOBuf>, std::string,
   std::string, std::unique_ptr<folly::IOBuf>>::Context Context;

  void read(Context* ctx, std::unique_ptr<folly::IOBuf> buf) override {
    if (buf) {
        // 合并IOBuf链到一个单独的IOBuf中
      buf->coalesce();
      std::string data((const char*)buf->data(), buf->length());
      ctx->fireRead(data);
    }
  }

  folly::Future<folly::Unit> write(Context* ctx, std::string msg) override {
      // 直接从字符换构造IOBuf
    auto buf = folly::IOBuf::copyBuffer(msg.data(), msg.length());
    return ctx->fireWrite(std::move(buf));
  }
};