前言
编解码是协议相关的,如果没有编解码Handler,那么在处理网络的粘包、拆包时会变得很复杂。除了http之类的公有协议之外,大多数的网络协议都是私有的,因此编解码也是多样的。但是最常用的编解码规则无外乎这几种:基于固定长度的、变长且长度字段位于一帧开头位置的、变长且长度字段不固定的、基于分隔符的、基于文本字符串的等等,好在Wangle框架本身提供了这些常用的编解码Handler,下面就依次看看它们的实现原理吧。
FixedLengthFrameDecoder
这是固定长度的编解码Handler,现在假设IOBufs中收到了如下4个数据包:
+---+----+------+----+
| A | BC | DEFG | HI |
+---+----+------+----+
假设协议规定一个完整帧的长度为3,那么FixedLengthFrameDecoder会将上面的数据解码为一下格式:
+-----+-----+-----+
| ABC | DEF | GHI |
+-----+-----+-----+
FixedLengthFrameDecoder的实现很简单:
class FixedLengthFrameDecoder : public ByteToByteDecoder {
public:
explicit FixedLengthFrameDecoder(size_t length) : length_(length) {}
bool decode(Context*,
folly::IOBufQueue& q,
std::unique_ptr<folly::IOBuf>& result,
size_t& needed) override {
// 判断已经收到到的数据包的长度
if (q.chainLength() < length_) {
// 如果长度不够,更新needed字段
needed = length_ - q.chainLength();
// 本次解码失败
return false;
}
// 否则,按照协议包的长度切割数据帧
result = q.split(length_);
// 解码成功
return true;
}
private:
// 保存协议包的长度,通过构造器注入
size_t length_;
};
ByteToByteDecoder是一个通用的解码Handler接口,本质是一个InboundHandler<folly::IOBufQueue&, M>类型的Handler,特殊之处在于,它内部定义了一个纯虚接口decode,有具体的解码策略实现。ByteToByteDecoder的定义如下:
template <typename M>
class ByteToMessageDecoder : public InboundHandler<folly::IOBufQueue&, M> {
public:
typedef typename InboundHandler<folly::IOBufQueue&, M>::Context Context;
/**
* Decode bytes from buf into result.
*
* @return bool - Return true if decoding is successful, false if buf
* has insufficient bytes.
*/
virtual bool decode(Context* ctx, folly::IOBufQueue& buf, M& result, size_t&) = 0;
void read(Context* ctx, folly::IOBufQueue& q) override {
bool success = true;
do {
// Rout类型
M result;
// 解码还需要的字节数
size_t needed = 0;
// 解码
success = decode(ctx, q, result, needed);
if (success) {
// 只有解码成功,事件才会继续往下传播
ctx->fireRead(std::move(result));
}
} while (success);
}
};
LengthFieldPrepender
LengthFieldPrepender和ByteToByteDecoder的区别是,LengthFieldPrepender的长度是不固定的,具体的长度由协议帧的长度字段指定(长度字段处于协议的开始部分)。假设现在要写一个内容为“HELLO, WORLD”的报文,采用协议字段长度为2的方式进行编码,则编码后的报文为:
+--------+----------------+
+ 0x000C | "HELLO, WORLD" |
+--------+----------------+
如果开启了lengthIncludesLengthFieldLength标识,也就是协议总长度包含了长度字段本身占用的字节数,那么编码为:
+--------+----------------+
+ 0x000E | "HELLO, WORLD" |
+--------+----------------+
LengthFieldPrepender的实现也是相当简单的:
class LengthFieldPrepender : public OutboundBytesToBytesHandler {
public:
explicit LengthFieldPrepender(int lengthFieldLength = 4,
int lengthAdjustment = 0,
bool lengthIncludesLengthField = false,
bool networkByteOrder = true);
folly::Future<folly::Unit> write(Context* ctx,std::unique_ptr<folly::IOBuf> buf){
// 总长度为本次要写的数据包的长度加上调整值
int length = lengthAdjustment_ + buf->computeChainDataLength();
// 如果协议长度包含长度字段本身
if (lengthIncludesLengthField_) {
// 那么总长度还需要加上长度字段占用的字节数
length += lengthFieldLength_;
}
if (length < 0) {
throw std::runtime_error("Length field < 0");
}
// 发送缓冲区
auto len = IOBuf::create(lengthFieldLength_);
// 多开辟lengthFieldLength_
len->append(lengthFieldLength_);
// 接收缓冲区的游标(处于最开始位置),便于按字节操作
folly::io::RWPrivateCursor c(len.get());
// 散转长度字段占用的字节数,最多八个字节
switch (lengthFieldLength_) {
case 1: {
if (length >= 256) {
throw std::runtime_error("length does not fit byte");
}
// 如果按照网络字节序
if (networkByteOrder_) {
// 大端形式写
c.writeBE((uint8_t)length);
} else {
// 小端形式写
c.writeLE((uint8_t)length);
}
break;
}
case 2: {
if (length >= 65536) {
throw std::runtime_error("length does not fit byte");
}
if (networkByteOrder_) {
c.writeBE((uint16_t)length);
} else {
c.writeLE((uint16_t)length);
}
break;
}
case 4: {
if (networkByteOrder_) {
c.writeBE((uint32_t)length);
} else {
c.writeLE((uint32_t)length);
}
break;
}
case 8: {
if (networkByteOrder_) {
c.writeBE((uint64_t)length);
} else {
c.writeLE((uint64_t)length);
}
break;
}
default: {
throw std::runtime_error("Invalid lengthFieldLength");
}
}
// 把buf合并到len中
len->prependChain(std::move(buf));
return ctx->fireWrite(std::move(len));
}
private:
int lengthFieldLength_;// 长度字段对应的长度(字节数)
int lengthAdjustment_;// 长度调整值
bool lengthIncludesLengthField_;// 帧长度是否包含长度字段本身
bool networkByteOrder_; // 网络字节序
};
其中,父类OutboundBytesToBytesHandler只是一个简单的别名:
typedef OutboundHandler<std::unique_ptr<folly::IOBuf>> OutboundBytesToBytesHandler;
LengthFieldBasedFrameDecoder
LengthFieldBasedFrameDecoder是一个非常强大、通用的解码Handler,它可以在IOBufs中动态的切割任何基于长度的协议报文,特别适合那种协议头部包含长度(无论是body长度还是总长度)字段的协议。
现在假设协议固定长度字段偏移为0,占用2个字节,并且不strip任何字节。假设长度字段的值为12(0x0c),代表的报文内容为“HELLO, WORLD”,LengthFieldBasedFrameDecoder在默认情况下的配置是:
lengthFieldOffset = 0
lengthFieldLength = 2
lengthAdjustment = 0
initialBytesToStrip = 0 (= do not strip header)
那么解码前和解码后的效果如下:
* BEFORE DECODE (14 bytes) AFTER DECODE (14 bytes)
* +--------+----------------+ +--------+----------------+
* | Length | Actual Content |----->| Length | Actual Content |
* | 0x000C | "HELLO, WORLD" | | 0x000C | "HELLO, WORLD" |
* +--------+----------------+ +--------+----------------+
而如果配置变为strip掉两个字节:
lengthFieldOffset = 0
lengthFieldLength = 2
lengthAdjustment = 0
initialBytesToStrip = 2 (= the length of the Length field)
那么解码前后的效果如下(即把长度字段给strip掉了):
* BEFORE DECODE (14 bytes) AFTER DECODE (12 bytes)
* +--------+----------------+ +----------------+
* | Length | Actual Content |----->| Actual Content |
* | 0x000C | "HELLO, WORLD" | | "HELLO, WORLD" |
* +--------+----------------+ +----------------+
大多数情况下,长度字段地表的长度值只包含了协议的body部分,就比如前面的两个例子。但是在有些情况下,长度字段代码的是整个协议报文的长度(header+body),这种情况下,可以指定一个长度调整值lengthAdjustment,因此这里的header只包含长度字段,其长度为2,因此这里将lengthAdjustment设置为-2。配置如下:
lengthFieldOffset = 0
lengthFieldLength = 2
lengthAdjustment = -2 (= the length of the Length field)
initialBytesToStrip = 0
效果为:
* BEFORE DECODE (14 bytes) AFTER DECODE (14 bytes)
* +--------+----------------+ +--------+----------------+
* | Length | Actual Content |----->| Length | Actual Content |
* | 0x000E | "HELLO, WORLD" | | 0x000E | "HELLO, WORLD" |
* +--------+----------------+ +--------+----------------+
现在假设协议的开始两个字节不再是长度字段,而是一个Header值,之后才是长度字段,并且长度字段变为3个字节,偏移为2,则配置为:
lengthFieldOffset = 2 (= the length of Header 1)
lengthFieldLength = 3
lengthAdjustment = 0
initialBytesToStrip = 0
则解码效果为:
* BEFORE DECODE (17 bytes) AFTER DECODE (17 bytes)
* +----------+----------+----------------+ +----------+----------+----------------+
* | Header 1 | Length | Actual Content |----->| Header 1 | Length | Actual Content |
* | 0xCAFE | 0x00000C | "HELLO, WORLD" | | 0xCAFE | 0x00000C | "HELLO, WORLD" |
* +----------+----------+----------------+ +----------+----------+----------------+
现在假设上例中的Header值位于长度字段和body之前,那么此时需要设置lengthAdjustment为2,一遍把Header的两个自己包含进来,配置如下:
lengthFieldOffset = 0
lengthFieldLength = 3
lengthAdjustment = 2 (= the length of Header 1)
initialBytesToStrip = 0
效果为:
* BEFORE DECODE (17 bytes) AFTER DECODE (17 bytes)
* +----------+----------+----------------+ +----------+----------+----------------+
* | Length | Header 1 | Actual Content |----->| Length | Header 1 | Actual Content |
* | 0x00000C | 0xCAFE | "HELLO, WORLD" | | 0x00000C | 0xCAFE | "HELLO, WORLD" |
* +----------+----------+----------------+ +----------+----------+----------------+
下面来了综合性的例子,假设在长度字段之前有一个字节的HDR1字段,同时长度字段之后还有一个字节的HDR2字段,长度字段本身占用两个字节,那么此时的配置为:
lengthFieldOffset = 1 (= the length of HDR1)
lengthFieldLength = 2
lengthAdjustment = 1 (= the length of HDR2)
initialBytesToStrip = 3 (= the length of HDR1 + LEN)
解码效果为:
* BEFORE DECODE (16 bytes) AFTER DECODE (13 bytes)
* +------+--------+------+----------------+ +------+----------------+
* | HDR1 | Length | HDR2 | Actual Content |----->| HDR2 | Actual Content |
* | 0xCA | 0x000C | 0xFE | "HELLO, WORLD" | | 0xFE | "HELLO, WORLD" |
* +------+--------+------+----------------+ +------+----------------+
最后,稍微修改一下上面的例子,现在长度字段代表了整个帧的长度,那么配置变为;
lengthFieldOffset = 1
lengthFieldLength = 2
lengthAdjustment = -3 (= the length of HDR1 + LEN, negative)
initialBytesToStrip = 3
解码效果为:
* BEFORE DECODE (16 bytes) AFTER DECODE (13 bytes)
* +------+--------+------+----------------+ +------+----------------+
* | HDR1 | Length | HDR2 | Actual Content |----->| HDR2 | Actual Content |
* | 0xCA | 0x0010 | 0xFE | "HELLO, WORLD" | | 0xFE | "HELLO, WORLD" |
* +------+--------+------+----------------+ +------+----------------+
好了,举了这么多的例子,大家应该对LengthFieldBasedFrameDecoder的用法足够了解了,下面来看看它的实现吧:
class LengthFieldBasedFrameDecoder : public ByteToByteDecoder {
public:
explicit LengthFieldBasedFrameDecoder(uint32_t lengthFieldLength = 4,
uint32_t maxFrameLength = UINT_MAX,
uint32_t lengthFieldOffset = 0,
int32_t lengthAdjustment = 0,
uint32_t initialBytesToStrip = 4,
bool networkByteOrder = true);
bool decode(Context* ctx,folly::IOBufQueue& buf,std::unique_ptr<folly::IOBuf>& result,size_t&) override {
// 如果读到的数据包长度还不足以读取长度字段,那么直接返回解码失败
if (buf.chainLength() < lengthFieldEndOffset_) {
return false;
}
// 解码获取未经调整的帧长度
uint64_t frameLength = getUnadjustedFrameLength(
buf, lengthFieldOffset_, lengthFieldLength_, networkByteOrder_);
// 对帧长度进行调整
frameLength += lengthAdjustment_ + lengthFieldEndOffset_;
// 如果帧长度小于长度字段结束位置的偏移
if (frameLength < lengthFieldEndOffset_) {
buf.trimStart(lengthFieldEndOffset_);
ctx->fireReadException(folly::make_exception_wrapper<std::runtime_error>(
"Frame too small"));
return false;
}
// 如果帧长度大于最大长度
if (frameLength > maxFrameLength_) {
buf.trimStart(frameLength);
ctx->fireReadException(folly::make_exception_wrapper<std::runtime_error>(
"Frame larger than " +
folly::to<std::string>(maxFrameLength_)));
return false;
}
// 如果已经读到的数据包长度小于帧长度
if (buf.chainLength() < frameLength) {
// 返回解码失败
return false;
}
// 如果要初始化丢弃的字节数比帧长度还要大
if (initialBytesToStrip_ > frameLength) {
buf.trimStart(frameLength);
ctx->fireReadException(folly::make_exception_wrapper<std::runtime_error>(
"InitialBytesToSkip larger than frame"));
return false;
}
// 丢弃字节数
buf.trimStart(initialBytesToStrip_);
// 计算最终的实际长度
int actualFrameLength = frameLength - initialBytesToStrip_;
result = buf.split(actualFrameLength);
return true;
}
private:
uint64_t getUnadjustedFrameLength(
folly::IOBufQueue& buf, int offset, int length, bool networkByteOrder);
uint32_t lengthFieldLength_;// 长度字段本身占用字节数
uint32_t maxFrameLength_;// 帧最大长度
uint32_t lengthFieldOffset_;// 长度字段在帧中的偏移字节数
int32_t lengthAdjustment_;// 长度调整值
uint32_t initialBytesToStrip_;// 初始化丢弃的字节数
bool networkByteOrder_;// 是否是网络字节序
uint32_t lengthFieldEndOffset_;// 长度字段结束位置在帧中的偏移位置 lengthFieldOffset + lengthFieldLength
};
关于getUnadjustedFrameLength的实现非常简单,此处就不再赘述了。
LineBasedFrameDecoder
LineBasedFrameDecoder从字面意思就可以知道,它是用来进行行解码的,也就是在遇到 "\n" 或 "\r\n"时就会被判定为一行。由于原理非常简单,自出就不再贴代码啦。
StringCodec
StringCodec主要完成IOBufs与std::string之间的编解码,直接来看它的实现吧:
class StringCodec : public Handler<std::unique_ptr<folly::IOBuf>, std::string,
std::string, std::unique_ptr<folly::IOBuf>> {
public:
typedef typename Handler<
std::unique_ptr<folly::IOBuf>, std::string,
std::string, std::unique_ptr<folly::IOBuf>>::Context Context;
void read(Context* ctx, std::unique_ptr<folly::IOBuf> buf) override {
if (buf) {
// 合并IOBuf链到一个单独的IOBuf中
buf->coalesce();
std::string data((const char*)buf->data(), buf->length());
ctx->fireRead(data);
}
}
folly::Future<folly::Unit> write(Context* ctx, std::string msg) override {
// 直接从字符换构造IOBuf
auto buf = folly::IOBuf::copyBuffer(msg.data(), msg.length());
return ctx->fireWrite(std::move(buf));
}
};
本系列文章
Wangle源码分析:EventBaseHandler、AsyncSocketHandler
Wangle源码分析:Pipeline、Handler、Context