Java网络编程——粘包拆包出现的原因及解决方式

胡玉洋　

已于 2022-08-05 09:54:48 修改

阅读量2.9k

点赞数 3

分类专栏： Java 运维 & 网络 & 安全文章标签： java 粘包拆包 Netty 解码

于 2022-08-05 08:30:00 首次发布

本文链接：https://blog.csdn.net/huyuyang6688/article/details/126096736

版权

在基于TCP协议的网络编程中，不可避免地都会遇到粘包和拆包的问题。

什么是粘包和拆包？

先来看个例子，还是上篇文章《Java网络编程——NIO的阻塞IO模式、非阻塞IO模式、IO多路复用模式的使用》中“IO多路复用模式”一节中的代码：
服务端

@Slf4j
public class NIOServer {
   

    public static void main(String[] args) throws Exception {
   
        ServerSocketChannel serverSocketChannel = ServerSocketChannel.open();
        serverSocketChannel.configureBlocking(false);
        serverSocketChannel.bind(new InetSocketAddress("127.0.0.1", 8080), 50);
        Selector selector = Selector.open();
        SelectionKey serverSocketKey = serverSocketChannel.register(selector, SelectionKey.OP_ACCEPT);
        while (true) {
   
            int count = selector.select();
            log.info("select event count:" + count);
            Set<SelectionKey> selectionKeys = selector.selectedKeys();
            Iterator<SelectionKey> iterator = selectionKeys.iterator();
            while (iterator.hasNext()) {
   
                SelectionKey selectionKey = iterator.next();
                if (selectionKey.isAcceptable()) {
   
                    handleAccept(selectionKey);
                }
                else if (selectionKey.isReadable()) {
   
                    handleRead(selectionKey);
                }
                iterator.remove();
            }
        }
    }

    private static void handleAccept(SelectionKey selectionKey) throws IOException {
   
        ServerSocketChannel serverSocketChannel = (ServerSocketChannel) selectionKey.channel();
        SocketChannel socketChannel = serverSocketChannel.accept();
        if (Objects.nonNull(socketChannel)) {
   
            log.info("receive connection from client. client:{}", socketChannel.getRemoteAddress());
            socketChannel.configureBlocking(false);
            Selector selector = selectionKey.selector();
            socketChannel.register(selector, SelectionKey.OP_READ);
        }
    }

    private static void handleRead(SelectionKey selectionKey) throws IOException {
   
        SocketChannel socketChannel = (SocketChannel) selectionKey.channel();
        ByteBuffer readBuffer = ByteBuffer.allocate(4);
        int length = socketChannel.read(readBuffer);
        if (length > 0) {
   
            log.info("receive message from client. client:{} message:{}", socketChannel.getRemoteAddress()
                    , new String(readBuffer.array(), 0, length, "UTF-8"));
        } else if (length == -1) {
   
            socketChannel.close();
            return;
        }
    }
    
}

客户端

@Slf4j
public class NIOClient {
   

    @SneakyThrows
    public static void main(String[] args) {
   
        SocketChannel socketChannel = SocketChannel.open();
        try {
   
            socketChannel.connect(new InetSocketAddress("127.0.0.1", 8080));
            ByteBuffer byteBuffer1, byteBuffer2;
            socketChannel.write(byteBuffer1 = ByteBuffer.wrap("你".getBytes(StandardCharsets.UTF_8)));
            socketChannel.write(byteBuffer2 = ByteBuffer.wrap("好".getBytes(StandardCharsets.UTF_8)));
            log.info("client send finished");
        } catch (Exception e) {
   
            e.printStackTrace();
        } finally {
   
            socketChannel.close();
        }
    }
    
}

Run模式启动服务端后后，再运行客户端，发现服务端接收并打印的结果如下：

receive message from client. client:/127.0.0.1:63618 message:你�
receive message from client. client:/127.0.0.1:63618 message:��

咦？客户端发送的虽然是汉字，但发送和接收的都是UTF-8编码格式，怎么会乱码呢？而且第一个“你”字也被服务端解析出来了，并没有乱码。

再分别以Debug模式启动服务端、客户端来分析：
当客户端运行到log.info("client send finished");时，可以发现“你”转化为UTF-8的字节数组为[-28, -67, -96] ，“好”转化为UTF-8的字节数组为其中“你”转化为[-27, -91, -67] ，先后两次分别向服务端发送了3个字节的数据：

服务端读数据的Buffer大小为4字节，所以得分两次读取，第一次读取了前4个字节[-28, -67, -96, -27] ：

在第一次读取到前4个字节并根据UTF-8规则解析为汉字时，前3个字节是完整的，可以转换为“你”，但第4个字节只是“好”对应的UTF-8字节数组的一部分，是不完整的，所以在解析的时候失败了，就显示出了乱码符号。
同理，在第二次读取的后2个字节也是不完整的，解析也不会成功，也显示了2个乱码符号。

那就有人会说了，不能在读取的时候把Buffer的大小置为3、6、9吗？
这只是模拟这种情况的一个简单的例子，如果把Buffer大小设置为6，那客户端要发送“你a好”呢（UTF-8字节数组为[-28, -67, -96, 97, -27, -91, -67]）？还是可能会乱码（还是会把UTF-8字节数组拆开为[-28, -67, -96, 97, -27, -91]和[-67]），服务端分收到这两段数据后同样无法成功解析。

这就是我们常说的拆包（也有人叫半包），对应的还有粘包，就是在通过TCP协议交互数据过程中，TCP底层并不了解它的上层业务数据（比如此文的例子中放入ByteBuffer中要发送的数据，或者HTTP报文等）的具体含义，可能会根据实际情况（比如TCP缓冲区或者此文中定义的NIO接收数据的缓冲区ByteBuffer）对数据包进行拆分或合并。
当客户端发送了一段较长的数据包时，在客户端可能会分成若干个较小的数据包分别发送，或者在服务端也可能分成了若干个较小的数据包来接收。用一句话总结就是，客户端发送的一段数据包到服务端接收后可能会被拆分为多个数据包。
当客户端发送了若干个较短的数据包时，在发送端可能会拼接成一个较大的数据包来发送，在接收端也可能会合并成一个较大的数据包来接收。用一句话总结就是，客户端发送的多段数据包到服务端接收后可能会合并分为一个数据包。
在之前的文章《TCP协议学习笔记、报文分析》中也遇到了粘包的情况，客户端先后向服务端分别发送了长度为20、30、40的字符串，但是通过tcpdump抓包分析的结果是客户端向服务端只发送了一条length=90的TCP报文。