【Netty源码分析总结】-CSDN博客

本文链接：https://blog.csdn.net/weixin_47068446/article/details/148396524

文章目录

Netty介绍
EventLoop开启扫描事件源码
Netty客户端和服务端源码分析：
Netty的注意点

Netty介绍

业务执行流程：
Netty 是一个高性能、异步事件驱动的网络应用框架，主要用于快速开发可靠、可维护的服务端和客户端网络程序，特别适合处理高并发、大吞吐量的网络通信场景。
在这里插入图片描述
Netty源码分析流程：

EventLoop开启扫描事件源码

NioEventLoopGroup继承的类：
在这里插入图片描述
NioEventLoop中启动Run()方法，扫描注册事件：
SingleThreadEventExecutor->execute()->startThread(),一般是非EventLoop得IO线程，才会执行startThread()方法

    public void execute(Runnable task) {
        if (task == null) {
            throw new NullPointerException("task");
        }

        boolean inEventLoop = inEventLoop();
        addTask(task);
        if (!inEventLoop) {
            startThread();
            if (isShutdown()) {
                boolean reject = false;
                try {
                    if (removeTask(task)) {
                        reject = true;
                    }
                } catch (UnsupportedOperationException e) {
                    // The task queue does not support removal so the best thing we can do is to just move on and
                    // hope we will be able to pick-up the task before its completely terminated.
                    // In worst case we will log on termination.
                }
                if (reject) {
                    reject();
                }
            }
        }

        if (!addTaskWakesUp && wakesUpForTask(task)) {
            wakeup(inEventLoop);
        }
    }
    private void startThread() {
      if (state == ST_NOT_STARTED) {
          if (STATE_UPDATER.compareAndSet(this, ST_NOT_STARTED, ST_STARTED)) {
              boolean success = false;
              try {
                  doStartThread();
                  success = true;
              } finally {
                  if (!success) {
                      STATE_UPDATER.compareAndSet(this, ST_STARTED, ST_NOT_STARTED);
                  }
              }
          }
      }
  }

注意点：为什么必须是非EventLoop线程才可以执行startThread()方法：
1.保证线程只启动一次
如果当前调用线程已经是该 NioEventLoop 绑定的线程（inEventLoop() 返回 true），说明线程已经启动且正在执行 run()，这时不需要重复启动。
2.避免死锁和线程安全问题
在 EventLoop 线程内调用 startThread()，如果这个方法还要启动线程，反而可能引发死锁或状态异常。因为线程已经在跑了。
3.延迟启动线程
Netty 设计中，线程的启动是延迟的，只有真正需要处理任务时才启动，避免资源浪费。

Netty是如何避免Nio的空轮训的：
1.Selector 重建机制当连续空轮询超过阈值时，自动重建 selector；
2.任务感知机制如果有任务可执行，即使 select 返回 0 也不会判定为空轮询；
3.wakeup 优化避免不必要的唤醒导致 select 提前返回；
4.版本兼容建议避免使用易触发 epoll 空轮询 bug 的 JDK 版本；
5.用户可调参数SELECTOR_AUTO_REBUILD_THRESHOLD 可配置，适应不同业务场景；

2.启动扫描NioEventLoop->run()->processSelectedKeys()

protected void run() {
        for (;;) {
            try {
                try {
                    switch (selectStrategy.calculateStrategy(selectNowSupplier, hasTasks())) {
                    case SelectStrategy.CONTINUE:
                        continue;

                    case SelectStrategy.BUSY_WAIT:
                        // fall-through to SELECT since the busy-wait is not supported with NIO

                    case SelectStrategy.SELECT:
                        select(wakenUp.getAndSet(false));
                        if (wakenUp.get()) {
                            selector.wakeup();
                        }
                        // fall through
                    default:
                    }
                } catch (IOException e) {
                    // If we receive an IOException here its because the Selector is messed up. Let's rebuild
                    // the selector and retry. https://github.com/netty/netty/issues/8566
                    rebuildSelector0();
                    handleLoopException(e);
                    continue;
                }

                cancelledKeys = 0;
                needsToSelectAgain = false;
                final int ioRatio = this.ioRatio;
                if (ioRatio == 100) {
                    try {
                        processSelectedKeys();
                    } finally {
                        // Ensure we always run tasks.
                        runAllTasks();
                    }
                } else {
                    final long ioStartTime = System.nanoTime();
                    try {
                        processSelectedKeys();
                    } finally {
                        // Ensure we always run tasks.
                        final long ioTime = System.nanoTime() - ioStartTime;
                        runAllTasks(ioTime * (100 - ioRatio) / ioRatio);
                    }
                }
            } catch (Throwable t) {
                handleLoopException(t);
            }
            // Always handle shutdown even if the loop processing threw an exception.
            try {
                if (isShuttingDown()) {
                    closeAll();
                    if (confirmShutdown()) {
                        return;
                    }
                }
            } catch (Throwable t) {
                handleLoopException(t);
            }
        }
    }

注意点：为什么要设置一个IORatio比例：
IORatio比例，默认50%，EventLoop中默认有两种任务，一种是IO任务，另一种是taskQueue中的任务。如果IO时间比例过高以及过多，就会导致taskQueue中的任务饿死，如果IO时间比例过低，大量的时间都在执行普通任务，如果普通任务过多，也会导致IO一直等待，导致IO饿死。
3.执行轮训的核心策略processSelectedKeys()->processSelectedKeysOptimized()：

    private void processSelectedKeys() {
        if (selectedKeys != null) {
            //不用JDK的selector.selectedKeys(), 性能更好（1%-2%），垃圾回收更少
            processSelectedKeysOptimized();
        } else {
            //JDK的selector.selectedKeys()
            processSelectedKeysPlain(selector.selectedKeys());
        }
    }
    private void processSelectedKeysOptimized() {
      for (int i = 0; i < selectedKeys.size; ++i) {
          final SelectionKey k = selectedKeys.keys[i];
          // See https://github.com/netty/netty/issues/2363
          selectedKeys.keys[i] = null;//清空数组元素，在通道关闭后对其进行GC
          //AbstractNioChannel中的doRegister()方法中
          //selectionKey = javaChannel().register(eventLoop().unwrappedSelector(), 0, this);
          final Object a = k.attachment();
          //NioSocketChannel
          if (a instanceof AbstractNioChannel) {
              processSelectedKey(k, (AbstractNioChannel) a);
          } else {
              @SuppressWarnings("unchecked")
              NioTask<SelectableChannel> task = (NioTask<SelectableChannel>) a;
              processSelectedKey(k, task);
          }

          if (needsToSelectAgain) {
              // null out entries in the array to allow to have it GC'ed once the Channel close
              // See https://github.com/netty/netty/issues/2363
              selectedKeys.reset(i + 1);

              selectAgain();
              i = -1;
          }
      }
  }

注意点：在这里Netty相比较于JDK默认的Selector做了优化，优化原因：
JDK的实现：每次 select 后都会得到一个新的 Set，这个 selectedKeys：
是一个 HashSet；
迭代性能不高；
清理时会涉及迭代删除；
会产生大量临时对象，增加 GC 压力。
Netty的实现：SelectedSelectionKeySet对象：
用一个固定大小的数组（SelectionKey[]）替代 HashSet；
无需装箱、拆箱；
避免 iterator 和 entrySet；
手动维护数组下标，实现高效 add/clear；
使用反射替换掉 SelectorImpl.selectedKeys 的默认实现！
4.NioEventLoop->ProcessSelectKey()扫描时间OP_CONNECT、OP_ACCEPT、OP_READ、OP_WRITE

private void processSelectedKey(SelectionKey k, AbstractNioChannel ch) {
        final AbstractNioChannel.NioUnsafe unsafe = ch.unsafe();
        if (!k.isValid()) {
            final EventLoop eventLoop;
            try {
                eventLoop = ch.eventLoop();
            } catch (Throwable ignored) {
                // If the channel implementation throws an exception because there is no event loop, we ignore this
                // because we are only trying to determine if ch is registered to this event loop and thus has authority
                // to close ch.
                return;
            }
            // Only close ch if ch is still registered to this EventLoop. ch could have deregistered from the event loop
            // and thus the SelectionKey could be cancelled as part of the deregistration process, but the channel is
            // still healthy and should not be closed.
            // See https://github.com/netty/netty/issues/5125
            if (eventLoop != this || eventLoop == null) {
                return;
            }
            // close the channel if the key is not valid anymore
            unsafe.close(unsafe.voidPromise());
            return;
        }

        try {
            int readyOps = k.readyOps();
            // We first need to call finishConnect() before try to trigger a read(...) or write(...) as otherwise
            // the NIO JDK channel implementation may throw a NotYetConnectedException.
            if ((readyOps & SelectionKey.OP_CONNECT) != 0) {
                // remove OP_CONNECT as otherwise Selector.select(..) will always return without blocking
                // See https://github.com/netty/netty/issues/924
                int ops = k.interestOps();
                ops &= ~SelectionKey.OP_CONNECT;
                k.interestOps(ops);

                unsafe.finishConnect();
            }

            // Process OP_WRITE first as we may be able to write some queued buffers and so free memory.
            if ((readyOps & SelectionKey.OP_WRITE) != 0) {
                // Call forceFlush which will also take care of clear the OP_WRITE once there is nothing left to write
                ch.unsafe().forceFlush();
            }

            // Also check for readOps of 0 to workaround possible JDK bug which may otherwise lead
            // to a spin loop
            //处理读请求（断开连接）或接入连接
            if ((readyOps & (SelectionKey.OP_READ | SelectionKey.OP_ACCEPT)) != 0 || readyOps == 0) {
                unsafe.read();
            }
        } catch (CancelledKeyException ignored) {
            unsafe.close(unsafe.voidPromise());
        }
    }

5.重点分析读OP_WRITE事件
AbstractChannel->flush0()->NioSocketChannel->doWrite()

protected void doWrite(ChannelOutboundBuffer in) throws Exception {
        SocketChannel ch = javaChannel();
        //有数据要写，且能写入，这最多尝试16次
        int writeSpinCount = config().getWriteSpinCount();
        do {
            if (in.isEmpty()) {
                // All written so clear OP_WRITE
                //数据都写完了,不用也不需要写16次
                clearOpWrite();
                // Directly return here so incompleteWrite(...) is not called.
                return;
            }

            // Ensure the pending writes are made of ByteBufs only.
            int maxBytesPerGatheringWrite = ((NioSocketChannelConfig) config).getMaxBytesPerGatheringWrite();
            //最多返回1024个数据，总的size尽量不超过maxBytesPerGatheringWrite
            ByteBuffer[] nioBuffers = in.nioBuffers(1024, maxBytesPerGatheringWrite);
            int nioBufferCnt = in.nioBufferCount();

            // Always us nioBuffers() to workaround data-corruption.
            // See https://github.com/netty/netty/issues/2761
            switch (nioBufferCnt) {//是否是单个数据，还是批量数据
                case 0:
                    // We have something else beside ByteBuffers to write so fallback to normal writes.
                    writeSpinCount -= doWrite0(in);
                    break;
                case 1: {//单个数据
                    // Only one ByteBuf so use non-gathering write
                    // Zero length buffers are not added to nioBuffers by ChannelOutboundBuffer, so there is no need
                    // to check if the total size of all the buffers is non-zero.
                    ByteBuffer buffer = nioBuffers[0];
                    int attemptedBytes = buffer.remaining();
                    final int localWrittenBytes = ch.write(buffer);
                    if (localWrittenBytes <= 0) {
                        incompleteWrite(true);
                        return;
                    }
                    adjustMaxBytesPerGatheringWrite(attemptedBytes, localWrittenBytes, maxBytesPerGatheringWrite);
                    //从ChannelOutboundBuffer中移除已经写出的数据
                    in.removeBytes(localWrittenBytes);
                    --writeSpinCount;
                    break;
                }
                default: {//批量数据
                    // Zero length buffers are not added to nioBuffers by ChannelOutboundBuffer, so there is no need
                    // to check if the total size of all the buffers is non-zero.
                    // We limit the max amount to int above so cast is safe
                    long attemptedBytes = in.nioBufferSize();
                    final long localWrittenBytes = ch.write(nioBuffers, 0, nioBufferCnt);
                    if (localWrittenBytes <= 0) {
                        //缓存区满了，写不进去了，注册写事件。
                        incompleteWrite(true);
                        return;
                    }
                    // Casting to int is safe because we limit the total amount of data in the nioBuffers to int above.
                    adjustMaxBytesPerGatheringWrite((int) attemptedBytes, (int) localWrittenBytes,
                            maxBytesPerGatheringWrite);
                    in.removeBytes(localWrittenBytes);
                    --writeSpinCount;
                    break;
                }
            }
        } while (writeSpinCount > 0);

        //写了16次数据，还是没有写完，直接schedule一个新的flush task出来。而不是注册写事件。
        incompleteWrite(writeSpinCount < 0);
    }

注意点：1.Netty 处理OP_Write事件，我们为什么要限制16次
在 Netty 中，处理 OP_WRITE（写就绪）事件时，我们限制最多连续写 16 次（默认），这是为了在高性能和公平性之间取得平衡。
Netty 默认只写 16 次，即使没写完也会暂停：
每次尝试 write()，如果没写完，就再试；
最多尝试 16 次，不管是否完成，都会“让出”线程；
下一次 OP_WRITE 事件或 flush() 调用时继续写。
6.分析OP_ACCEPT操作
AbstractNioMessageChannel->read()
注意核心方法：doReadMessage()

 public void read() {
            assert eventLoop().inEventLoop();
            final ChannelConfig config = config();
            final ChannelPipeline pipeline = pipeline();
            final RecvByteBufAllocator.Handle allocHandle = unsafe().recvBufAllocHandle();
            allocHandle.reset(config);

            boolean closed = false;
            Throwable exception = null;
            try {
                try {
                    do {
                    	//核心方法，为新链接创建一个NioSocketChannel通道
                        int localRead = doReadMessages(readBuf);
                        if (localRead == 0) {
                            break;
                        }
                        if (localRead < 0) {
                            closed = true;
                            break;
                        }

                        allocHandle.incMessagesRead(localRead);
                    } while (allocHandle.continueReading());
                } catch (Throwable t) {
                    exception = t;
                }
				
                int size = readBuf.size();
                for (int i = 0; i < size; i ++) {
                    readPending = false;
                    pipeline.fireChannelRead(readBuf.get(i));
                }
                readBuf.clear();
                allocHandle.readComplete();
                pipeline.fireChannelReadComplete();

                if (exception != null) {
                    closed = closeOnReadError(exception);

                    pipeline.fireExceptionCaught(exception);
                }

                if (closed) {
                    inputShutdown = true;
                    if (isOpen()) {
                        close(voidPromise());
                    }
                }
            } finally {
                // Check if there is a readPending which was not processed yet.
                // This could be for two reasons:
                // * The user called Channel.read() or ChannelHandlerContext.read() in channelRead(...) method
                // * The user called Channel.read() or ChannelHandlerContext.read() in channelReadComplete(...) method
                //
                // See https://github.com/netty/netty/issues/2254
                if (!readPending && !config.isAutoRead()) {
                    removeReadOp();
                }
            }
        }
    }

NioServerSocketChannel->doReadMessage()

    @Override
    protected int doReadMessages(List<Object> buf) throws Exception {
        //接受新连接创建SocketChannel
        SocketChannel ch = SocketUtils.accept(javaChannel());

        try {
            if (ch != null) {
            	//创建一个新的NioSocketChannel添加到Buffer中
                buf.add(new NioSocketChannel(this, ch));
                return 1;
            }
        } catch (Throwable t) {
            logger.warn("Failed to create a new channel from an accepted socket.", t);

            try {
                ch.close();
            } catch (Throwable t2) {
                logger.warn("Failed to close a socket.", t2);
            }
        }

        return 0;
    }

AbstractChannelHandlerContext->invokeChannelRead()

    private void invokeChannelRead(Object msg) {
        if (invokeHandler()) {
            try {
            	核心
                ((ChannelInboundHandler) handler()).channelRead(this, msg);
            } catch (Throwable t) {
                notifyHandlerException(t);
            }
        } else {
            fireChannelRead(msg);
        }
    }

ServerBootstrapAcceptor->channelRead()
这个方法就是将BoosGroup的OP_ACCEPT事件，注册到workGroup上的EventLoopGroup上。

        public void channelRead(ChannelHandlerContext ctx, Object msg) {
            final Channel child = (Channel) msg;
            // 为客户端连接设置 handler、options、attributes
            child.pipeline().addLast(childHandler);
            setChannelOptions(child, childOptions, logger);
            setAttributes(child, childAttrs);

            try {
                childGroup.register(child).addListener(new ChannelFutureListener() {
                    @Override
                    public void operationComplete(ChannelFuture future) throws Exception {
                        if (!future.isSuccess()) {
                            forceClose(child, future.cause());
                        }
                    }
                });
            } catch (Throwable t) {
                forceClose(child, t);
            }
        }

MultithreadEventLoopGroup.register()来选择数组中位置的EventLoop然后再进行注册：
SingleThreadEventLoop.register() 注册的核心方法。

    public ChannelFuture register(final ChannelPromise promise) {
        ObjectUtil.checkNotNull(promise, "promise");
        promise.channel().unsafe().register(this, promise);
        return promise;
    }

Netty客户端和服务端源码分析：

在这里插入图片描述

在 Netty 中，Bootstrap 和 ServerBootstrap 是客户端和服务端启动的核心辅助类，分别用于初始化、配置并启动客户端 Channel 和服务端 Channel。
ServerBootStrap源码分析：
1.AnsractBootStrap->initRegister()

final ChannelFuture initAndRegister() {
        Channel channel = null;
        try {
            channel = channelFactory.newChannel();//  1、创建一个ServerSocketChannel(NioServerSocketChannel)
            init(channel);//2、初始化ServerSocketChannel(NioServerSocketChannel)
        } catch (Throwable t) {
            if (channel != null) {
                // channel can be null if newChannel crashed (eg SocketException("too many open files"))
                channel.unsafe().closeForcibly();
                // as the Channel is not registered yet we need to force the usage of the GlobalEventExecutor
                return new DefaultChannelPromise(channel, GlobalEventExecutor.INSTANCE).setFailure(t);
            }
            // as the Channel is not registered yet we need to force the usage of the GlobalEventExecutor
            return new DefaultChannelPromise(new FailedChannel(), GlobalEventExecutor.INSTANCE).setFailure(t);
        }
        //3、Register:给ServerSocketChannel从Bossgroup中选择一个NioEventLoop
        //选择一个EventLoop注册channel
         ChannelFuture regFuture = config().group().register(channel);
        if (regFuture.cause() != null) {
            if (channel.isRegistered()) {
                channel.close();
            } else {
                channel.unsafe().closeForcibly();
            }
        }

        // If we are here and the promise is not failed, it's one of the following cases:
        // 1) If we attempted registration from the event loop, the registration has been completed at this point.
        //    i.e. It's safe to attempt bind() or connect() now because the channel has been registered.
        // 2) If we attempted registration from the other thread, the registration request has been successfully
        //    added to the event loop's task queue for later execution.
        //    i.e. It's safe to attempt bind() or connect() now:
        //         because bind() or connect() will be executed *after* the scheduled registration task is executed
        //         because register(), bind(), and connect() are all bound to the same thread.

        return regFuture;
    }

Netty的注意点

一、内存泄漏
原因：
使用了 ByteBuf 手动管理内存，未正确调用 release() 导致内存未释放。
长时间缓存 ChannelHandlerContext 或 Channel 等引用。
没有关闭 EventLoopGroup 或定时任务未取消。
解决方式：
使用 SimpleLeakAwareByteBuf 检查泄漏。
配合 -Dio.netty.leakDetection.level=PARANOID 启用泄漏检测。
确保手动分配的资源（如 ByteBufAllocator）释放。
二、业务阻塞导致 IO 线程卡死
原因：
在 Netty 的 EventLoop 中执行了阻塞操作（数据库访问、文件 IO 等）。
解决方式：
将耗时操作 offload 到业务线程池。
使用 DefaultEventExecutorGroup 将某些 Handler 放入独立线程执行。
三、Netty 中的“粘包/拆包问题”是 TCP 通信中常见的问题之一。
原因：
应用层写入数据过快多次 write() 被合并发送
接收缓冲区处理慢多个包积压
网络缓冲 TCP 层做了 Nagle 合并或乱序到达
消息本身无边界一段 byte[] 不包含长度/结束符，无法分辨。
解决方式：
使用 Netty 提供的 LengthFieldBasedFrameDecoder 等解码器。
协议设计要明确消息长度或使用特定分隔符
四、内存使用过高或 Full GC 频繁
原因：
分配过多堆外内存（DirectMemory），导致系统内存紧张。
使用堆内 ByteBuf 并频繁拷贝数据。
定时任务或回调未清理导致对象无法 GC。
解决方式：
控制 PooledByteBufAllocator 使用，避免大块内存泄漏。
定期监控堆外内存使用量（Netty direct memory）。
使用 jmap, jstack 配合排查对象泄漏。