netty5.0之SingleThreadEventLoop & NioEventLoop

最新推荐文章于 2022-03-07 21:18:38 发布

tuohuangs

最新推荐文章于 2022-03-07 21:18:38 发布

阅读量2.1k

点赞数 1

分类专栏： netty java

本文链接：https://blog.csdn.net/lzlchangqi/article/details/41532189

版权

java 同时被 2 个专栏收录

16 篇文章 0 订阅

订阅专栏

netty

9 篇文章 0 订阅

订阅专栏

SingleThreadEventLoop继承自SingleThreadEventExecutor这是一个标准的线程池的实现。和JDK中线程池的实现大同小异。主要的用处就是执行任务。 NioEventLoop继承自SingleThreadEventLoop，作为NIO框架的Reactor线程，需要处理网络IO读写事件，因此他必须聚合一个多路复用器。

参考《Netty权威指南》李林锋；http://xw-z1985.iteye.com/blog/1928244

类关系如下图：

本文就着手分析NioEventLoop实现的线程运行逻辑

@Override
    protected void run() {
        for (;;) {
            oldWakenUp = wakenUp.getAndSet(false);
            try {
                if (hasTasks()) {
                    selectNow();
                } else {
                    select();

                    // 'wakenUp.compareAndSet(false, true)' is always evaluated
                    // before calling 'selector.wakeup()' to reduce the wake-up
                    // overhead. (Selector.wakeup() is an expensive operation.)
                    //
                    // However, there is a race condition in this approach.
                    // The race condition is triggered when 'wakenUp' is set to
                    // true too early.
                    //
                    // 'wakenUp' is set to true too early if:
                    // 1) Selector is waken up between 'wakenUp.set(false)' and
                    //    'selector.select(...)'. (BAD)
                    // 2) Selector is waken up between 'selector.select(...)' and
                    //    'if (wakenUp.get()) { ... }'. (OK)
                    //
                    // In the first case, 'wakenUp' is set to true and the
                    // following 'selector.select(...)' will wake up immediately.
                    // Until 'wakenUp' is set to false again in the next round,
                    // 'wakenUp.compareAndSet(false, true)' will fail, and therefore
                    // any attempt to wake up the Selector will fail, too, causing
                    // the following 'selector.select(...)' call to block
                    // unnecessarily.
                    //
                    // To fix this problem, we wake up the selector again if wakenUp
                    // is true immediately after selector.select(...).
                    // It is inefficient in that it wakes up the selector for both
                    // the first case (BAD - wake-up required) and the second case
                    // (OK - no wake-up required).

                    if (wakenUp.get()) {
                        selector.wakeup();
                    }
                }

                cancelledKeys = 0;

                final long ioStartTime = System.nanoTime();
                needsToSelectAgain = false;
                if (selectedKeys != null) {
                    processSelectedKeysOptimized(selectedKeys.flip());
                } else {
                    processSelectedKeysPlain(selector.selectedKeys());
                }
                final long ioTime = System.nanoTime() - ioStartTime;

                final int ioRatio = this.ioRatio;
                runAllTasks(ioTime * (100 - ioRatio) / ioRatio);

                if (isShuttingDown()) {
                    closeAll();
                    if (confirmShutdown()) {
                        break;
                    }
                }
            } catch (Throwable t) {
                logger.warn("Unexpected exception in the selector loop.", t);

                // Prevent possible consecutive immediate failures that lead to
                // excessive CPU consumption.
                try {
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    // Ignore.
                }
            }
        }
    }

1、所有的逻辑操作都在for循环体内进行，只有当NioEventLoop接到退出指令的时候，才退出循环，这也是通用的处理NIO消息的线程实现方式。

2、首先将wakenUp还原为false，并将之前的wake up状态保存到oldWakenUp变量中。通过hasTasks()方法判断点前的消息队列中是否有消息尚未处理，如果有则调用selectNow()方法进行一次select操作，看是否有准备就绪的Channel需要处理。它的实现如下：

void selectNow() throws IOException {
        try {
            selector.selectNow();
        } finally {
            // restore wakup state if needed
            if (wakenUp.get()) {
                selector.wakeup();
            }
        }
    }

selector的selectNow()方法会立即触发Selector的选择操作，如果有准备就绪的channel，则返回就绪的channel集合，否则返回0,。选择完成后，再次判断用户是否调用了Selector的wakeup方法，如果调用，则执行selector.wakeup()操作。

关于wakeup（若不清楚，参考http://ifeve.com/selectors/#wakeUp）：1）某个线程调用select()方法后阻塞了，即使没有通道已经就绪，也有办法让其从select()方法返回。只要让其它线程在第一个线程调用select()方法的那个对象上调用Selector.wakeup()方法即可。阻塞在select()方法上的线程会立马返回。2）如果有其它线程调用了wakeup()方法，但当前没有线程阻塞在select()方法上，下个调用select()方法的线程会立即“醒来（wake up）”。

3、返回方法，分析select方法，由Selector多路复用器轮询。

private void select() throws IOException {
        Selector selector = this.selector;
        try {
            int selectCnt = 0;
            long currentTimeNanos = System.nanoTime();
            long selectDeadLineNanos = currentTimeNanos + delayNanos(currentTimeNanos);
            for (;;) {
                long timeoutMillis = (selectDeadLineNanos - currentTimeNanos + 500000L) / 1000000L;
                if (timeoutMillis <= 0) {
                    if (selectCnt == 0) {
                        selector.selectNow();
                        selectCnt = 1;
                    }
                    break;
                }

                int selectedKeys = selector.select(timeoutMillis);
                selectCnt ++;

                if (selectedKeys != 0 || oldWakenUp || wakenUp.get() || hasTasks()) {
                    // Selected something,
                    // waken up by user, or
                    // the task queue has a pending task.
                    break;
                }

                if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 &&
                        selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {
                    // The selector returned prematurely many times in a row.
                    // Rebuild the selector to work around the problem.
                    logger.warn(
                            "Selector.select() returned prematurely {} times in a row; rebuilding selector.",
                            selectCnt);

                    rebuildSelector();
                    selector = this.selector;

                    // Select again to populate selectedKeys.
                    selector.selectNow();
                    selectCnt = 1;
                    break;
                }

                currentTimeNanos = System.nanoTime();
            }

            if (selectCnt > MIN_PREMATURE_SELECTOR_RETURNS) {
                if (logger.isDebugEnabled()) {
                    logger.debug("Selector.select() returned prematurely {} times in a row.", selectCnt - 1);
                }
            }
        } catch (CancelledKeyException e) {
            if (logger.isDebugEnabled()) {
                logger.debug(CancelledKeyException.class.getSimpleName() + " raised by a Selector - JDK bug?", e);
            }
            // Harmless exception - log anyway
        }
    }

1）取当前系统的纳秒时间，调用delayNanos()方法获得NioEventLoop中定时任务的触发时间

2）计算下一个将要触发的定时任务的剩余超时时间，将它转换成毫秒，为超时时间增加0.5毫秒的调整值。对剩余的超时时间进行判断，如果需要立即执行或已经超时，则调用selector.selectNow()进行轮询操作，将selectCnt设置为1，并退出当前循环。

3）将定时任务剩余的超时时间作为参数进行select操作，没完成一次select操作，对select计数器selectCnt加1。

4）Select操作完成之后，需要对结果进行判断，如果存在下列任意一种情况，则退出当前循环。

A：有Channel处于就绪状态，selectKeys不为0，说明有读写事件需要处理；

B：oldWakenUp为true；

C：系统或用户调用了wakeup操作，唤醒当前的多路复用器；

5）如果本次Selector的轮询结果为空，也没有wakeup操作或是新的消息需要处理，则说明是个空轮询，有可能触发了JDK的epll bug,他会导致Selector的空轮询，使I/O线程一致处于100%状态，介质到当前最新的JDK7版本，该bug仍然没有被完全修复。所以Netty需要对该bug进行规避和修正。

Bug-id = 6403933的Selector堆栈如图：见《Netty权威指南 P433》李林锋著

该bug的修复策略如下：

(1) 对Selector的select操作周期进行统计；

(2) 每完成一次空的select操作进行一次计数；

(3) 在某个周期(例如100ms)内如果连续发生N次空轮询，说明出发了JDK NIO的epoll()死循环bug。

6）监测到Selector处于死循环后，需要通过重建Selector的方式让系统恢复正常，见rebuildSelector()方法

7）如果轮询到了处于就绪状态的SocketChannel，则需要处理网络I/O事件。

见run()方法

  final long ioStartTime = System.nanoTime();
                needsToSelectAgain = false;
                if (selectedKeys != null) {
                    processSelectedKeysOptimized(selectedKeys.flip());
                } else {
                    processSelectedKeysPlain(selector.selectedKeys());
                }
                final long ioTime = System.nanoTime() - ioStartTime;

由于未开启selectedKeys优化功能，所以会进入processSelectedKeysPlain分支执行。下面继续分析processSelectedKeysPlain的代码实现如下：

 private void processSelectedKeysPlain(Set<SelectionKey> selectedKeys) {
        // check if the set is empty and if so just return to not create garbage by
        // creating a new Iterator every time even if there is nothing to process.
        // See https://github.com/netty/netty/issues/597
        if (selectedKeys.isEmpty()) {
            return;
        }

        Iterator<SelectionKey> i = selectedKeys.iterator();
        for (;;) {
            final SelectionKey k = i.next();
            final Object a = k.attachment();
            i.remove();

            if (a instanceof AbstractNioChannel) {
                processSelectedKey(k, (AbstractNioChannel) a);
            } else {
                @SuppressWarnings("unchecked")
                NioTask<SelectableChannel> task = (NioTask<SelectableChannel>) a;
                processSelectedKey(k, task);
            }

            if (!i.hasNext()) {
                break;
            }

            if (needsToSelectAgain) {
                selectAgain();
                selectedKeys = selector.selectedKeys();

                // Create the iterator again to avoid ConcurrentModificationException
                if (selectedKeys.isEmpty()) {
                    break;
                } else {
                    i = selectedKeys.iterator();
                }
            }
        }
    }

对SelectionKey进行保护性判断，如果为空则返回。获取SelectionKey的迭代器进行循环操作，通过迭代器获取SelectionKey和SocketChannel的附件对象，将已选择的选择键从迭代器中删除，防止下次被重复选择和处理(不解)。如下

private void processSelectedKeysPlain(Set<SelectionKey> selectedKeys) {
        // check if the set is empty and if so just return to not create garbage by
        // creating a new Iterator every time even if there is nothing to process.
        // See https://github.com/netty/netty/issues/597
        if (selectedKeys.isEmpty()) {
            return;
        }

        Iterator<SelectionKey> i = selectedKeys.iterator();
        for (;;) {
            final SelectionKey k = i.next();
            final Object a = k.attachment();
            i.remove();

            if (a instanceof AbstractNioChannel) {
                processSelectedKey(k, (AbstractNioChannel) a);
            } else {
                @SuppressWarnings("unchecked")
                NioTask<SelectableChannel> task = (NioTask<SelectableChannel>) a;
                processSelectedKey(k, task);
            }

            if (!i.hasNext()) {
                break;
            }

            if (needsToSelectAgain) {
                selectAgain();
                selectedKeys = selector.selectedKeys();

                // Create the iterator again to avoid ConcurrentModificationException
                if (selectedKeys.isEmpty()) {
                    break;
                } else {
                    i = selectedKeys.iterator();
                }
            }
        }
    }

对SocketChannel的附件类型进行判读，如果是AbstractNioChannel类型，说明它是NioServerSockdtChannel或者NioSocketChannel，需要进行I/O读写相关的操作；如果它是NioTask，则对其进行类型转换，调用processSelectedKey进行处理。由于Netty自身没实现NioTask接口，所以通常情况下系统不会执行该分支，除非用户自行注册到该Task到多路复用器。

8)分析I/O时间处理，代码如下：

private static void processSelectedKey(SelectionKey k, AbstractNioChannel ch) {
        final NioUnsafe unsafe = ch.unsafe();
        if (!k.isValid()) {//<span style="font-size: 15.5555562973022px; line-height: 25.2000007629395px; text-indent: 36px;">首先从NioServerSocketChannel或者NioSocketChannel中获取内部类Unsafe，判断当前选择键是否可//用，如果不可用，则调用Unsafe的close方法，释放连接资源。</span>
            // close the channel if the key is not valid anymore
            unsafe.close(unsafe.voidPromise());
            return;
        }

        try {
            int readyOps = k.readyOps();
            // Also check for readOps of 0 to workaround possible JDK bug which may otherwise lead
            // to a spin loop<span style="font-size: 15.5555562973022px; line-height: 25.2000007629395px; text-indent: 36px;">如果选择键可用，则继续对网络操作位进行判断,ruguo</span>
            if ((readyOps & (SelectionKey.OP_READ | SelectionKey.OP_ACCEPT)) != 0 || readyOps == 0) {
                unsafe.read();//如果是读或者连接操作，调用Unsafe的read方法，Unsafe此处实现时多态
                if (!ch.isOpen()) {
                    // Connection already closed - no need to handle write.
                    return;
                }
            }
            if ((readyOps & SelectionKey.OP_WRITE) != 0) {//如果操作位为写，需要调用flush,处理半包消息发送
                // Call forceFlush which will also take care of clear the OP_WRITE once there is nothing left to write
                ch.unsafe().forceFlush();
            }
            if ((readyOps & SelectionKey.OP_CONNECT) != 0) {//如果网络操作位为连接状态，对连接结果进行判读
                // remove OP_CONNECT as otherwise Selector.select(..) will always return without blocking
                // See https://github.com/netty/netty/issues/924
                int ops = k.interestOps();
                ops &= ~SelectionKey.OP_CONNECT;
                k.interestOps(ops);

                unsafe.finishConnect();
            }
        } catch (CancelledKeyException e) {
            unsafe.close(unsafe.voidPromise());
        }
    }

此处Unsafe实现时多态。对于NioServerSocketChannel，它的读操作就是接收客户端的TCP连接；对于NioSocketChannel，它的对操作就是从SocketChannel读取ByteBuffer。

9) 处理完I/O事件之后，NioEventLoop需要执行非I/O操作的系统Task和定时任务

见run()方法

 final int ioRatio = this.ioRatio;
     runAllTasks(ioTime * (100 - ioRatio) / ioRatio);

由于NioEventLoop需要同时处理I/O事件和非I/O任务，为了保证两者都能得到足够的CPU执行时间被执行，Netty提供了I/O比例供用户定制。如果I/O操作多余定时任务和Task，则可以讲I/O比例调大，反之则调小，默认值为50%.

Task的执行时间根据本次I/O操作的执行时间计算得来。下面我们具体看runAllTasks方法的实现

 /**
     * Poll all tasks from the task queue and run them via {@link Runnable#run()} method.  This method stops running
     * the tasks in the task queue and returns if it ran longer than {@code timeoutNanos}.
     */
    protected boolean runAllTasks(long timeoutNanos) {
        fetchFromDelayedQueue();
        Runnable task = pollTask();
        if (task == null) {
            return false;
        }

        final long deadline = ScheduledFutureTask.nanoTime() + timeoutNanos;
        long runTasks = 0;
        long lastExecutionTime;
        for (;;) {
            try {
                task.run();
            } catch (Throwable t) {
                logger.warn("A task raised an exception.", t);
            }

            runTasks ++;

            // Check timeout every 64 tasks because nanoTime() is relatively expensive.
            // XXX: Hard-coded value - will make it configurable if it is really a problem.
            if ((runTasks & 0x3F) == 0) {
                lastExecutionTime = ScheduledFutureTask.nanoTime();
                if (lastExecutionTime >= deadline) {
                    break;
                }
            }

            task = pollTask();
            if (task == null) {
                lastExecutionTime = ScheduledFutureTask.nanoTime();
                break;
            }
        }

        this.lastExecutionTime = lastExecutionTime;
        return true;
    }

首先从定时任务消息队列中弹出消息进行处理，如果消息队列为空，则退出循环。根据当前的时间戳进行判断，如果该定时任务已经或者处于超时状态，则将其加入到执行Task Queue中，同时从延时队列中删除。定时任务如果没有潮湿，说明本轮循环不需要处理，直接退出即可。代码实现如下：

private void fetchFromDelayedQueue() {
        long nanoTime = 0L;
        for (;;) {
            ScheduledFutureTask<?> delayedTask = delayedTaskQueue.peek();
            if (delayedTask == null) {
                break;
            }

            if (nanoTime == 0L) {
                nanoTime = ScheduledFutureTask.nanoTime();
            }

            if (delayedTask.deadlineNanos() <= nanoTime) {
                delayedTaskQueue.remove();
                taskQueue.add(delayedTask);
            } else {
                break;
            }
        }
    }

如果不退出，执行Task Queue中原有的任务和从延时队列中复制的已经超时或者正处于超时状态的定时任务。

由于获取系统纳秒时间是个耗时的操作，每次循环都获取当前系统纳秒时间进行超时判断会降低性能。为了提升性能，每执行60次循环判断一次，如果当前系统时间已经达到了分配给非I/O操作的超时时间，则退出循环。这是为了防止非I/O任务过多导致I/O操作被长时间阻塞。

10) 最后判断系统是否进入优雅停机状态

如果处于关闭状态，则需要调用closeAll方法，释放资源，并让NioEventLoop线程退出循环，结束运行。资源关闭的代码如下

private void closeAll() {
        selectAgain();
        Set<SelectionKey> keys = selector.keys();
        Collection<AbstractNioChannel> channels = new ArrayList<AbstractNioChannel>(keys.size());
        for (SelectionKey k: keys) {
            Object a = k.attachment();
            if (a instanceof AbstractNioChannel) {
                channels.add((AbstractNioChannel) a);
            } else {
                k.cancel();
                @SuppressWarnings("unchecked")
                NioTask<SelectableChannel> task = (NioTask<SelectableChannel>) a;
                invokeChannelUnregistered(task, k, null);
            }
        }

        for (AbstractNioChannel ch: channels) {
            ch.unsafe().close(ch.unsafe().voidPromise());
        }
    }

遍历获取所有的Channel，调用它的Unsafe.close()方法关闭所有链路，释放线程池、ChannelPipelien和ChannelHandler等资源。