Netty与TCP粘包拆包问题

shark-chili

已于 2023-12-02 10:51:40 修改

阅读量4.5k

点赞数 4

分类专栏： # Netty专栏文章标签： tcp/ip netty

于 2022-03-24 09:34:40 首次发布

本文链接：https://blog.csdn.net/shark_chili3007/article/details/103705375

版权

Netty专栏专栏收录该内容

23 篇文章 4 订阅

订阅专栏

简介

当Netty涉及网络IO数据传输时，可能会涉及到下面这些面试题:

什么是TCP粘包和拆包？为什么UDP不会出现这个问题？
发生粘包和拆包的原因是什么？
Netty是如何解决TCP粘包和拆包的？

TCP粘包拆包问题

问题复现

在正式讲解问题之前，我们先来看一段示例，查看TCP粘包和拆包问题是如何发生的，下面这两段代码分别是服务端配置和业务处理器，它会在与客户端建立连接之后，不断输出客户端发送的数据：

public class NettyServer {
    public static void main(String[] args) {
        // 启动一个netty服务端需要指定 线程模型 IO模型 业务处理逻辑

        // 引导类负责引导服务端启动工作
        ServerBootstrap serverBootstrap = new ServerBootstrap();

        // 以下两个对象可以看做是两个线程组

        // 负责监听端口，接受新的连接
        NioEventLoopGroup bossGroup = new NioEventLoopGroup(1);
        // 负责处理每一个连接读写的线程组
        NioEventLoopGroup workerGroup = new NioEventLoopGroup(1);

        // 配置线程组并指定NIO模型
        serverBootstrap.group(bossGroup, workerGroup)
                //设置IO模型，这里为NioServerSocketChannel,建议Linux服务器使用 EpollServerSocketChannel
                .channel(NioServerSocketChannel.class)
                // 定义后续每个连接的数据读写，对于业务处理逻辑
                .childHandler(new ChannelInitializer<NioSocketChannel>() {
                    @Override
                    protected void initChannel(NioSocketChannel nioSocketChannel) throws Exception {
                        nioSocketChannel.pipeline()
                                .addLast(new FirstServerHandler());
                    }
                });



        bind(serverBootstrap, 8888);
    }

    /**
     * 以端口号递增的形式尝试绑定端口号
     */
    private static void bind(ServerBootstrap serverBootstrap, int port) {
        serverBootstrap.bind(port);
    }
}

服务端业务处理器核心代码：

public class FirstServerHandler extends ChannelInboundHandlerAdapter {

    /**
     * 收到客户端数据后会回调该方法
     */
    @Override
    public void channelRead(ChannelHandlerContext ctx, Object msg) {
        ByteBuf byteBuf = (ByteBuf) msg;
        System.out.println(DateUtil.now() + ": 服务端读到数据 -> " + byteBuf.toString(StandardCharsets.UTF_8));

    }


}

我们再来看看客户端的业务处理器和配置类，业务处理器的代码非常简单，在建立连接后连续发送1000条数据，数据内容为:hello Netty Server!：

public class FirstClientHandler extends ChannelInboundHandlerAdapter {

    /**
     * 客户端连接服务端成功后会回调该方法
     */
    @Override
    public void channelActive(ChannelHandlerContext ctx) throws Exception {
        for (int i = 0; i < 1000; i++) {
            // 获取数据
            ByteBuf byteBuf = getByteBuf(ctx);
            // 把数据写到服务端
            ctx.channel().writeAndFlush(byteBuf);
        }

    }

    private ByteBuf getByteBuf(ChannelHandlerContext ctx) {
        byte[] bytes = "hello Netty Server!".getBytes(StandardCharsets.UTF_8);

        ByteBuf buffer = ctx.alloc().buffer();
        buffer.writeBytes(bytes);

        return buffer;
    }


}

而配置类也是固定模板：

public class NettyClient {



    public static void main(String[] args) throws InterruptedException {
        // 整体即完成netty客户端需要指定线程模型、IO模型、业务处理逻辑

        // 负责客户端的启动
        Bootstrap bootstrap = new Bootstrap();
        // 客户端的线程模型
        NioEventLoopGroup workerGroup = new NioEventLoopGroup();

        // 指定线程组
        bootstrap.group(workerGroup)
                //指定NIO模型
                .channel(NioSocketChannel.class)
                // IO处理逻辑
                .handler(new ChannelInitializer<Channel>() {
                    @Override
                    protected void initChannel(Channel channel) throws Exception {
                        channel.pipeline().addLast(new FirstClientHandler());
                    }
                });

        // 建立连接
        connect(bootstrap, "127.0.0.1", 8888);


    }

    /**
     * 建立连接的方法，使用监听器来进行重试
     */
    private static Channel connect(Bootstrap bootstrap, String host, int port) {
        return bootstrap.connect(host, port).channel();
    }
}

将服务端和客户端启动后，我们可以看到下面这段输出，可以看到大量的hello Netty Server!数据粘在一起构成一个个粘包。

2023-08-29 09:09:24: 服务端读到数据 -> hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Server!hello Netty Serve
2023-08-29 09:09:24: 服务端读到数据 -> r!hello Netty Server!hello Netty Server!hello Netty Ser

原因剖析

在TCP编程中，在服务端与客户端通信时消息都会有固定的消息格式，这种格式我们通常称之为protocol即协议，例如我们常见的应用层协议:HTTP、FTP等。
而上述例子出现粘包的原因本质就是我们服务端与客户端进行通信时，没有确认协议的规范，因为TCP是面向连接、面向流的协议，它会因为各种原因导致完整的数据包被拆封无数个小的数据包进行发送，进而导致接收方收到数据后无法正确的处理数据，出现粘包和拆包。

而出现TCP数据包被拆分的原因大致有3个：

socket缓冲区与滑动窗口
nagle算法
mss

先来说说socket缓冲区和滑动窗口的共同作用，我们都知道TCP是全双工、面向流的协议。这意味发送时必须要保证收发正常，所以TCP就提出了一个滑动窗口机制，即以滑动窗口的大小为单位，让双方基于这个窗口的大小进行数据收发，发送方只有在滑动窗口以内的数据才能被发送，接收方也只有在窗口以内的数据被接收和处理，只有接收方的滑动窗口收到发送方的数据，且处理完成并发送确认信号ACK之后，发送方的窗口才能继续向后移动。

在这里插入图片描述

而在此期间双方收发的数据也都会会存放到socket缓冲区中。由于TCP是面向流的协议，这意味这连个缓冲区是无法知晓这些数据是否属于同一个数据包的。
同理socket缓冲区也分为发送缓冲区(SO_SNDBUF )和接收缓冲区(SO_RCVBUF)，所有socket需要发送的数据也都是存放到socket的缓冲区中然后通过内核函数传到内核协议栈进行数据发送，socket接收缓冲区也是通过操作系统的内核函数将数据拷贝至socket缓冲区。

所以。socket缓冲区和滑动窗口机制共同作用下就会出现以下两种异常情况：

发送方发送的数据达到了滑动窗口的限制，停止发送，接收方的socket缓冲区拿到这些数据后，直接向应用层传输，因为包不是完整的，从接收方的角度来看，出现了拆包。

在这里插入图片描述

发送方发送多个数据包到接收方缓冲区，因为接收方socket缓冲区无法及时处理，导致真正开始处理时无法知晓数据包的边界，只能一次性将数据包向上传递，导致粘包。

在这里插入图片描述

再来说说Nagle算法，考虑到每次发送数据包时都需要为数据加上TCP Header20字节和IP header 20字节，以及还得等待发送方的ACK确认包，这就很可能出现，我们为了1个字节的有用信息去组装10字节的头部信息，很明显这次一种不合理的开销。操作系统为了尽可能的利用网络带宽，就提出了Nagle算法，该算法要求所有已发送出去的小数据包（长度小于SMSS）必须等到接收方的都回复ack信号之后，然后再将这些小数据段一并打包成一个打包发送，从而尽可能利用带宽及尽可能避免因为大量小的网络包的传输造成网络拥塞。
很明显如果将多个小的数据包合并发送，接收方也很可能因为无法确认数据包的边界而出现粘包或拆包问题。

在这里插入图片描述

最后就是mss，也就是Maximum Segement Size的缩写，代表传输一次性可以发送的数据最大长度，如果数据超过MSS的最大值，那么网络数据包就会被拆成多个小包发送，这种情况下也很可能因为零零散散的数据包发送而会出现粘包和拆包问题。

在这里插入图片描述

对此我们不妨通过WireShark进行抓包分析，基于服务端端口键入如下指令进行过滤:

ip.src==127.0.0.1 and ip.dst==127.0.0.1 and tcp.port==8888

启动客户端和服务端之后，发现双方交换得出的MSS远大于每次发送的数据大小，所以首先排除分包问题:

在这里插入图片描述

查看每次服务端发送的数据，无论大小还是内容都没有缺失，内核缓冲区空间也是充足的，所以原因很明显，因为TCP协议是面向流传输，接收方从内核缓冲区读取时，拿到了过多或者过少的数据导致粘包或拆包。

在这里插入图片描述

解决方案

其实上述的问题的原因都是因为TCP是面向流的协议，导致了数据包无法被正常切割成一个个正常数据包的流。就以上面的数据包为例，发送的数据为hello Netty Server!，其实我们做到下面这几种分割方式：

如果发送的数据都是以"!"结尾，那我们的分割时就判断收到的流是否包含"!"，只有包含时再将数据装成数据包发送。
上述发送的数据长度为19，我们也可以规定发送的数据长度为19字节，一旦收到的数据达到19个字节之后，就组装成一个数据包。
自定义一个协议，要求发送方根据协议要求组装数据包发送，例如要求数据包包含长度length和data两个字段，其中length记录数据包长度，以上述数据为例，这个字段的值为19，而data包含的就是数据内容。

先来看看基于分隔符的，可以看到每一个数据末尾都有一个感叹号，所以我们可以通过判断特殊符号完成数据拆包。

代码如下，我们基于DelimiterBasedFrameDecoder完成基于特殊分隔符进行拆包，每个参数对应含义为:

数据包最大长度。
解码时是否去掉分隔符。
分隔符。

 ByteBuf delimiter = Unpooled.copiedBuffer("!".getBytes());
 
                        nioSocketChannel.pipeline()
                                .addLast(new DelimiterBasedFrameDecoder(Integer.MAX_VALUE,false,delimiter))
                                .addLast(new FirstServerHandler());

启动之后可以看到问题也得以解决：

2023-08-29 09:19:44: 服务端读到数据 -> hello Netty Server!
2023-08-29 09:19:44: 服务端读到数据 -> hello Netty Server!
2023-08-29 09:19:44: 服务端读到数据 -> hello Netty Server!
2023-08-29 09:19:44: 服务端读到数据 -> hello Netty Server!
2023-08-29 09:19:44: 服务端读到数据 -> hello Netty Server!
2023-08-29 09:19:44: 服务端读到数据 -> hello Netty Server!
2023-08-29 09:19:44: 服务端读到数据 -> hello Netty Server!
2023-08-29 09:19:44: 服务端读到数据 -> hello Netty Server!
2023-08-29 09:19:44: 服务端读到数据 -> hello Netty Server!

同理，我们也可以基于数据长度，对数据包进行分割：

由上文可知，我们发送的数据长度都是19，所以第一种方案是在服务端的pipeline配置一个基于长度拆包的解码器，确保在每19个字节截取一次以确保数据包可以正确读取和解析。
所以我们在pipeline添加一个FixedLengthFrameDecoder，长度设置为19。

  nioSocketChannel.pipeline()
                                .addLast(new FixedLengthFrameDecoder(19))
                                .addLast(new FirstServerHandler());

最后一种，也是笔者比较推荐的一种长度，即自定义协议，我们在传输过程中，可能数据的长度或者分隔符都无法保证，所以我们可以和客户端协商一下，在传输的数据头部添加一个数据包长度，例如用4字节表示数据包长度。

所以客户端建立连接后写数据的代码就改为:

 private ByteBuf getByteBuf(ChannelHandlerContext ctx) {
        byte[] bytes = "hello Netty Server!".getBytes(StandardCharsets.UTF_8);

        ByteBuf buffer = ctx.alloc().buffer();
        //4个字节说明数据的长度
        buffer.writeInt(bytes.length);
        //写入数据内容
        buffer.writeBytes(bytes);

        return buffer;
    }

最终的数据包结构如下图所示：

在这里插入图片描述

而服务端的处理器则改为使用LengthFieldBasedFrameDecoder，代码如下，按照对应参数含义为:

数据包最大长度，这里我们设置为Integer.MAX_VALUE，等同于不限制。
描述长度字段的位置偏移量，这里设置为0，即直接从最开始的位置开始读取。
描述字段的字节数，我们设置为4字节。
要添加到长度字段值的补偿值，这个字段比较有意思，如下图，假如我们需要得到data的数据，而长度记录的值为12字节(代表整个包的长度)，为了达到我们的预期，我们就可以基于将这个字段的值设置为-2，以确保截取10字节数据。

在这里插入图片描述

对应的我们本次数据包长度记录的值没有错，这里直接直接设置为0，无需调整。

读取时需要跳过数据包几个字节，这里设置为4，代表我们要跳过length字段，只要data的数据。

 nioSocketChannel.pipeline()
                                .addLast(new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE,0,4,0,4))
                                .addLast(new FirstServerHandler());

详解Netty解码器

查看上述几个解码器的类图可以看到它们都有一个父类ByteToMessageDecoder，而ByteToMessageDecoder则是这些子类共同的抽象类，它提取了解码器的核心逻辑：

在这里插入图片描述

ByteToMessageDecoder也是一个channelInboundHandller，这意味在当有消息到达pipeline时，消息也会到达ByteToMessageDecoder的channelRead方法，而查看ByteToMessageDecoder的channelRead方法可以看出其内部会调用一个callDecode方法，他就是解决粘包和拆包的核心所在。
若消息可读，它会调用decode方法进行消息解码，而这个解码方法是一个抽象方法，由我们上文所说的几个解码器按照各自的逻辑进行解码。
decode方法一旦解码成功就会将完整的数据包存到out这个集合上，所以一旦ByteToMessageDecoder发现out存在数据即说明当前组装好了一个完整的数据包，就会进入if分支调用fireChannelRead将完整的消息传递到后续的业务处理器中，所以这也是为什么我们要将解码器添加到pipeline的第一位上作为消息接收的处理器。

protected void callDecode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) {
        try {
            while (in.isReadable()) {
                int outSize = out.size();

                if (outSize > 0) {
                    fireChannelRead(ctx, out, outSize);
                    out.clear();

                    // Check if this handler was removed before continuing with decoding.
                    // If it was removed, it is not safe to continue to operate on the buffer.
                    //
                    // See:
                    // - https://github.com/netty/netty/issues/4635
                    if (ctx.isRemoved()) {
                        break;
                    }
                    outSize = 0;
                }

                int oldInputLength = in.readableBytes();
                decode(ctx, in, out);

                // Check if this handler was removed before continuing the loop.
                // If it was removed, it is not safe to continue to operate on the buffer.
                //
                // See https://github.com/netty/netty/issues/1664
                if (ctx.isRemoved()) {
                    break;
                }

                if (outSize == out.size()) {
                    if (oldInputLength == in.readableBytes()) {
                        break;
                    } else {
                        continue;
                    }
                }

                if (oldInputLength == in.readableBytes()) {
                    throw new DecoderException(
                            StringUtil.simpleClassName(getClass()) +
                            ".decode() did not read anything but decoded a message.");
                }

                if (isSingleDecode()) {
                    break;
                }
            }
        } catch (DecoderException e) {
            throw e;
        } catch (Throwable cause) {
            throw new DecoderException(cause);
        }
    }

这里我们以最常用的解码器LengthFieldBasedFrameDecoder作为示例了解一下Netty的解码器，我们上文提到我们的数据包格式为前4个字节用int记录数据包长度，后续data域是不定长的。所以我们当时的定义是这样的。对应每一个字段的含义分别是:

maxFrameLength：数据包最大长度，如果上述ByteToMessageDecoder循环中得到的数据包超过这个参数指定的值就会将这一段数据包丢弃。
lengthFieldOffset:长度字段在偏移量，以我们的数据包为例，长度字段就在首位无需便宜所以直接设置为0。
lengthFieldLength:描述长度字段的长度，以我们的数据包为例，记录长度字段的类型为int，所以这里设置为4字节。
lengthAdjustment：假如长度字段记录的长度值和实际长度有偏差，可以在这里进行设置，因为我们的数据的长度字段记录的就是data长度，所以我们无需调整设置为0即可，假如说我们希望实际的长度值要加上长度字段的值，这里就可以设置为4。
initialBytesToStrip:经过上述各个参数设置后，解码器就可以得到一个完整的数据包，initialBytesToStrip表示从当前数据包那个位置开始读取，因为我们要去data域的值，所以需要跳过长度字段，所以这里取+4，意味跳过这个字段。

new LengthFieldBasedFrameDecoder(Integer.MAX_VALUE,0,4,0,4)

 public LengthFieldBasedFrameDecoder(
            int maxFrameLength,
            int lengthFieldOffset, int lengthFieldLength,
            int lengthAdjustment, int initialBytesToStrip) {
        this(
                maxFrameLength,
                lengthFieldOffset, lengthFieldLength, lengthAdjustment,
                initialBytesToStrip, true);
    }

查看LengthFieldBasedFrameDecoder的decode方法会自调用内部decode方法完成消息组装，若消息完整组装则将将消息存入out列表中，这也就为什么外部判断消息是否接收完整的依据是判断out这个列表是否有数据。

@Override
    protected final void decode(ChannelHandlerContext ctx, ByteBuf in, List<Object> out) throws Exception {
        Object decoded = decode(ctx, in);
        if (decoded != null) {
            out.add(decoded);
        }
    }

核心方法就是decode了，在进行数据解码时，它会首先会判断当前的数据流是否已经超过数据包限制大小，如果超过则比较可读字节和丢弃大小，找到最小的值并将这段超长的无效数据包丢弃。

然后会判断当前可读取的数据包大小是否大于长度字段的结束位置lengthFieldEndOffset ，这个值即lengthFieldOffset + lengthFieldLength;得到的，若未超过说明当前数据包不完整直接返回空，等待下次新数据到来时在进行判断组装。
然后就是从当前数据的起始位置加上长度字段读取的偏移得到长度实际偏移的数据从而获取实际数据大小，由此即可精确定位一个完整的数据包避免了粘包和拆包问题。

得到数据有效之后，通过长度校正lengthAdjustment 再加上lengthFieldEndOffset长度字段之前的数据长度，从而得到一个完整的数据包长度。
之后在进行有效性判断，直接跳过initialBytesToStrip个字段，截取实际要的数据并返回，对应的完整代码如下，读者可自行参阅：

protected Object decode(ChannelHandlerContext ctx, ByteBuf in) throws Exception {
        if (discardingTooLongFrame) {
            long bytesToDiscard = this.bytesToDiscard;
            int localBytesToDiscard = (int) Math.min(bytesToDiscard, in.readableBytes());
            in.skipBytes(localBytesToDiscard);
            bytesToDiscard -= localBytesToDiscard;
            this.bytesToDiscard = bytesToDiscard;

            failIfNecessary(false);
        }

        if (in.readableBytes() < lengthFieldEndOffset) {
            return null;
        }

        int actualLengthFieldOffset = in.readerIndex() + lengthFieldOffset;
        long frameLength = getUnadjustedFrameLength(in, actualLengthFieldOffset, lengthFieldLength, byteOrder);

        if (frameLength < 0) {
            in.skipBytes(lengthFieldEndOffset);
            throw new CorruptedFrameException(
                    "negative pre-adjustment length field: " + frameLength);
        }

        frameLength += lengthAdjustment + lengthFieldEndOffset;

        if (frameLength < lengthFieldEndOffset) {
            in.skipBytes(lengthFieldEndOffset);
            throw new CorruptedFrameException(
                    "Adjusted frame length (" + frameLength + ") is less " +
                    "than lengthFieldEndOffset: " + lengthFieldEndOffset);
        }

        if (frameLength > maxFrameLength) {
            long discard = frameLength - in.readableBytes();
            tooLongFrameLength = frameLength;

            if (discard < 0) {
                // buffer contains more bytes then the frameLength so we can discard all now
                in.skipBytes((int) frameLength);
            } else {
                // Enter the discard mode and discard everything received so far.
                discardingTooLongFrame = true;
                bytesToDiscard = discard;
                in.skipBytes(in.readableBytes());
            }
            failIfNecessary(true);
            return null;
        }

        // never overflows because it's less than maxFrameLength
        int frameLengthInt = (int) frameLength;
        if (in.readableBytes() < frameLengthInt) {
            return null;
        }

        if (initialBytesToStrip > frameLengthInt) {
            in.skipBytes(frameLengthInt);
            throw new CorruptedFrameException(
                    "Adjusted frame length (" + frameLength + ") is less " +
                    "than initialBytesToStrip: " + initialBytesToStrip);
        }
        in.skipBytes(initialBytesToStrip);

        // extract frame
        int readerIndex = in.readerIndex();
        int actualFrameLength = frameLengthInt - initialBytesToStrip;
        ByteBuf frame = extractFrame(ctx, in, readerIndex, actualFrameLength);
        in.readerIndex(readerIndex + actualFrameLength);
        return frame;
    }