我们先来思考一下什么是NioEventLoop?
NioEventLoop组件是Netty的核心组件之一,每个NioEventLoop对象都和Nio的多路复用器Selector一样,要管理成千上万条链路的处理。
NioEventLoop的结构图:
NioEventLoop有以下5个核心功能:
• 开启Selector并初始化。
• 把ServerSocketChannel注册到Selector上。
• 处理各种I/O事件,如OP_ACCEPT、OP_CONNECT、OP_READ、OP_WRITE事件。
• 执行定时调度任务。
• 解决JDK空轮询bug。
NioEventLoop的整体功能图:
1. NioEventLoop 如何开启Selector的?
可以看到在构造方法中openSelector()方法就可以开启
NioEventLoop(NioEventLoopGroup parent, Executor executor, SelectorProvider selectorProvider,
SelectStrategy strategy, RejectedExecutionHandler rejectedExecutionHandler,
EventLoopTaskQueueFactory queueFactory) {
super(parent, executor, false, newTaskQueue(queueFactory), newTaskQueue(queueFactory),
rejectedExecutionHandler);
this.provider = ObjectUtil.checkNotNull(selectorProvider, "selectorProvider");
this.selectStrategy = ObjectUtil.checkNotNull(strategy, "selectStrategy");
// 开启Selector
final SelectorTuple selectorTuple = openSelector();
this.selector = selectorTuple.selector;
this.unwrappedSelector = selectorTuple.unwrappedSelector;
}
那么我们来看openSelector()的实现:
private SelectorTuple openSelector() {
final Selector unwrappedSelector;
try {
// 创建 Selector,这里是调用java JDK对NIO的实现
unwrappedSelector = provider.openSelector();
} catch (IOException e) {
throw new ChannelException("failed to open a new selector", e);
}
// 判断是否开启优化开关,默认没有开启直接返回Selector
if (DISABLE_KEY_SET_OPTIMIZATION) {
return new SelectorTuple(unwrappedSelector);
}
// 如果开启,通过反射创建SelectorImpl对象
Object maybeSelectorImplClass = AccessController.doPrivileged(new PrivilegedAction<Object>() {
@Override
public Object run() {
try {
return Class.forName(
"sun.nio.ch.SelectorImpl",
false,
PlatformDependent.getSystemClassLoader());
} catch (Throwable cause) {
return cause;
}
}
});
if (!(maybeSelectorImplClass instanceof Class) ||
// ensure the current selector implementation is what we can instrument.
!((Class<?>) maybeSelectorImplClass).isAssignableFrom(unwrappedSelector.getClass())) {
if (maybeSelectorImplClass instanceof Throwable) {
Throwable t = (Throwable) maybeSelectorImplClass;
logger.trace("failed to instrument a special java.util.Set into: {}", unwrappedSelector, t);
}
return new SelectorTuple(unwrappedSelector);
}
final Class<?> selectorImplClass = (Class<?>) maybeSelectorImplClass;
// 使用优化后的SelectedSelectionKeySet 对象
// 将JDK的sun.nio.ch.SelectorImpl.selectedKeySet替换掉
/**
* 在这里为什么要进行替换呢?有哪些好处呢?
* SelectedSelectionKeySet 主要是数据结构改变了,用数组代替了HashSet,
* 重写了add()和iterator()方法,使数组遍历效率更高。
*/
final SelectedSelectionKeySet selectedKeySet = new SelectedSelectionKeySet();
Object maybeException = AccessController.doPrivileged(new PrivilegedAction<Object>() {
@Override
public Object run() {
try {
Field selectedKeysField = selectorImplClass.getDeclaredField("selectedKeys");
Field publicSelectedKeysField = selectorImplClass.getDeclaredField("publicSelectedKeys");
if (PlatformDependent.javaVersion() >= 9 && PlatformDependent.hasUnsafe()) {
// Let us try to use sun.misc.Unsafe to replace the SelectionKeySet.
// This allows us to also do this in Java9+ without any extra flags.
long selectedKeysFieldOffset = PlatformDependent.objectFieldOffset(selectedKeysField);
long publicSelectedKeysFieldOffset =
PlatformDependent.objectFieldOffset(publicSelectedKeysField);
if (selectedKeysFieldOffset != -1 && publicSelectedKeysFieldOffset != -1) {
PlatformDependent.putObject(
unwrappedSelector, selectedKeysFieldOffset, selectedKeySet);
PlatformDependent.putObject(
unwrappedSelector, publicSelectedKeysFieldOffset, selectedKeySet);
return null;
}
// We could not retrieve the offset, lets try reflection as last-resort.
}
// 设置为可写
Throwable cause = ReflectionUtil.trySetAccessible(selectedKeysField, true);
if (cause != null) {
return cause;
}
cause = ReflectionUtil.trySetAccessible(publicSelectedKeysField, true);
if (cause != null) {
return cause;
}
// 通过反射的方式把 selector 的 selectedKeys 和 publicSelectedKeys
// 使用Netty构造的 selectedKeys 替换 JDK 的 selectedKeySet
selectedKeysField.set(unwrappedSelector, selectedKeySet);
publicSelectedKeysField.set(unwrappedSelector, selectedKeySet);
return null;
} catch (NoSuchFieldException e) {
return e;
} catch (IllegalAccessException e) {
return e;
}
}
});
if (maybeException instanceof Exception) {
selectedKeys = null;
Exception e = (Exception) maybeException;
logger.trace("failed to instrument a special java.util.Set into: {}", unwrappedSelector, e);
return new SelectorTuple(unwrappedSelector);
}
// 把selectedKeySet赋给NioEventLoop的属性,并返回Selector元数据
selectedKeys = selectedKeySet;
logger.trace("instrumented a special java.util.Set into: {}", unwrappedSelector);
return new SelectorTuple(unwrappedSelector,
new SelectedSelectionKeySetSelector(unwrappedSelector, selectedKeySet));
}
我们从最开始的结构图可以看到,NioEventLoop 继承了SingleThreadEventExecutor.java,那么在SingleThreadEventExecutor类中定义了run()抽象方法,等待子类去实现:
/**
* Run the tasks in the {@link #taskQueue}
*/
protected abstract void run();
因此,我们来看NioEventLoop最核心的方法,run()方法。
run()方法主要分三部分:select(boolean oldWakenUp),用来轮询就绪的Channel;process SelectedKeys,用来处理轮询到的SelectionKey;runAllTasks,用来执行队列任务。
在老版本的Netty中,NIO的处理逻辑如下:
第一部分,select(boolean oldWakenUp):主要目的是轮询看看是否有准备就绪的Channel。在轮询过程中会调用NIO Selector的selectNow()和select(timeoutMillis)方法。由于对这两个方法的调用进行了很明显的区分,因此调用这两个方法的条件也有所不同,具体逻辑如下。
- (1)当定时任务需要触发且之前未轮询过时,会调用selectNow()方法立刻返回。
- (2)当定时任务需要触发且之前轮询过(空轮询或阻塞超时轮询)直接返回时,没必要再调用selectNow()方法。
- (3)若taskQueue队列中有任务,且从EventLoop线程进入select()方法开始后,一直无其他线程触发唤醒动作,则需要调用selectNow()方法,并立刻返回。因为在运行select(boolean
oldWakenUp)之前,若有线程触发了wakeUp动作,则需要保证tsakQueue队列中的任务得到了及时处理,防止等待timeoutMillis超时后处理。 - (4)当select(timeoutMillis)阻塞运行时,在以下4种情况下会正常唤醒线程:其他线程执行了wakeUp唤醒动作、检测到就绪Key、遇上空轮询、超时自动醒来。唤醒线程后,除了空轮询会继续轮询,其他正常情况会跳出循环。
但是在最新的Netty4版本中,我发现Netty的处理逻辑与上述有所不同了。
比如在新版中就没有select(boolean oldWakenUp)这个方法了。
/**
* selectNow() 方法会检查当前是否有就绪的 IO 事件, 如果有, 则返回就绪 IO 事件的个数;
* 如果没有, 则返回0. 注意, selectNow() 是立即返回的, 不会阻塞当前线程
*/
int selectNow() throws IOException {
return selector.selectNow();
}
/**
* 根据deadlineNanos判断是否有任务需要执行
*
* @param deadlineNanos
* @return 返回
* @throws IOException
*/
private int select(long deadlineNanos) throws IOException {
if (deadlineNanos == NONE) {
return selector.select();
}
// Timeout will only be 0 if deadline is within 5 microsecs
// 只有当deadline在5微秒内时,Timeout才为0
long timeoutMillis = deadlineToDelayNanos(deadlineNanos + 995000L) / 1000000L;
// 如果timeoutMillis <= 0,直接返回,因此selectNow()是不阻塞的
// 否则,执行selector.select(timeoutMillis)方法,返回感兴趣的channel的数量
return timeoutMillis <= 0 ? selector.selectNow() : selector.select(timeoutMillis);
}
/**
* run()方法中的逻辑主要分为以下三大步骤
* <p>
* select 选择任务,select 实际上是个阻塞操作
* processSelectedKeys 处理Channel 感兴趣的就绪 IO 事件
* runAllTasks 运行所有普通任务和定时任务
*/
@Override
protected void run() {
// 检测次数,每循环一次加1
int selectCnt = 0;
// 运用for空轮询,实现NIO的selector机制
for (; ; ) {
try {
int strategy;
try {
/***
* 根据是否有任务获取策略,默认策略。
* 当有任务时,返回selector.selectNow()
* 当无任务时,返回SelectStrategy.SELECT
*/
strategy = selectStrategy.calculateStrategy(selectNowSupplier, hasTasks());
switch (strategy) {
case SelectStrategy.CONTINUE:
continue;
case SelectStrategy.BUSY_WAIT:
// fall-through to SELECT since the busy-wait is not supported with NIO
case SelectStrategy.SELECT:
// 下一个定时任务结束时间
// 定时任务是哪来的呢?
long curDeadlineNanos = nextScheduledTaskDeadlineNanos();
// 如果下一个定时任务是空
if (curDeadlineNanos == -1L) {
curDeadlineNanos = NONE; // nothing on the calendar
}
// 下一个唤醒的纳秒值
nextWakeupNanos.set(curDeadlineNanos);
try {
// 任务队列是否有任务
if (!hasTasks()) {
// 返回可能是0
// 也可能是当前感兴趣的key的数量
strategy = select(curDeadlineNanos);
}
} finally {
// This update is just to help block unnecessary selector wakeups
// so use of lazySet is ok (no race condition)
// 这个更新只是为了帮助阻止不必要的选择器唤醒
// 所以使用懒设置
nextWakeupNanos.lazySet(AWAKE);
}
// fall through
default:
}
} catch (IOException e) {
// If we receive an IOException here its because the Selector is messed up. Let's rebuild
// the selector and retry. https://github.com/netty/netty/issues/8566
// 出现IO异常时需要重新构建Selector
rebuildSelector0();
selectCnt = 0;
handleLoopException(e);
continue;
}
// 每循环一次selectCnt++
selectCnt++;
// 取消的keys
cancelledKeys = 0;
// 是否需要再次选择
needsToSelectAgain = false;
// io比率, 默认是50
final int ioRatio = this.ioRatio;
// 是否运行任务
boolean ranTasks;
// 当IO比率是100
if (ioRatio == 100) {
try {
// 当前有感兴趣的channel
if (strategy > 0) {
// I/O操作,根据selectedKeys进行处理
processSelectedKeys();
}
} finally {
// Ensure we always run tasks.
// 执行完所有任务
ranTasks = runAllTasks();
}
}
// ioRatio 不是100,并且有感兴趣的channel
else if (strategy > 0) {
// io开始的时间
final long ioStartTime = System.nanoTime();
try {
// I/O操作,根据selectedKeys进行处理
processSelectedKeys();
} finally {
// Ensure we always run tasks.
final long ioTime = System.nanoTime() - ioStartTime;
// 按一定比例执行任务,有可能遗留一部分任务等待下次执行
ranTasks = runAllTasks(ioTime * (100 - ioRatio) / ioRatio);
}
} else {
// 他将运行最小数量的任务
ranTasks = runAllTasks(0); // This will run the minimum number of tasks
}
if (ranTasks || strategy > 0) {
// 当selectCnt > 3时,并且logger.isDebugEnabled()=true
if (selectCnt > MIN_PREMATURE_SELECTOR_RETURNS && logger.isDebugEnabled()) {
logger.debug("Selector.select() returned prematurely {} times in a row for Selector {}.",
selectCnt - 1, selector);
}
// 将selectCnt置为0
selectCnt = 0;
} else if (unexpectedSelectorWakeup(selectCnt)) { // Unexpected wakeup (unusual case)
selectCnt = 0;
}
} catch (CancelledKeyException e) {
// Harmless exception - log anyway
if (logger.isDebugEnabled()) {
logger.debug(CancelledKeyException.class.getSimpleName() + " raised by a Selector {} - JDK bug?",
selector, e);
}
} catch (Error e) {
throw (Error) e;
} catch (Throwable t) {
handleLoopException(t);
} finally {
// Always handle shutdown even if the loop processing threw an exception.
try {
if (isShuttingDown()) {
closeAll();
if (confirmShutdown()) {
return;
}
}
} catch (Error e) {
throw (Error) e;
} catch (Throwable t) {
handleLoopException(t);
}
}
}
}
当然,在这里引申出一个常见面试题:NioEventLoop是如何解决JDK空轮询的Bug的?
- 首先在初始化时会设置一个阈值:512
int selectorAutoRebuildThreshold = SystemPropertyUtil.getInt("io.netty.selectorAutoRebuildThreshold", 512);
if (selectorAutoRebuildThreshold < MIN_PREMATURE_SELECTOR_RETURNS) {
selectorAutoRebuildThreshold = 0;
}
SELECTOR_AUTO_REBUILD_THRESHOLD = selectorAutoRebuildThreshold;
if (logger.isDebugEnabled()) {
logger.debug("-Dio.netty.noKeySetOptimization: {}", DISABLE_KEY_SET_OPTIMIZATION);
logger.debug("-Dio.netty.selectorAutoRebuildThreshold: {}", SELECTOR_AUTO_REBUILD_THRESHOLD);
}
- 其中有一个selectCnt变量,select每返回一次,就会加1
- 最后在这个方法中,判断selectCnt的值,如果>=512,就重置selector
// returns true if selectCnt should be reset
// 如果selectCnt应该被重置,则返回true
private boolean unexpectedSelectorWakeup(int selectCnt) {
if (Thread.interrupted()) {
// Thread was interrupted so reset selected keys and break so we not run into a busy loop.
// As this is most likely a bug in the handler of the user or it's client library we will
// also log it.
//
// See https://github.com/netty/netty/issues/2426
/**
* 线程被中断,所以重置选择的键和中断,所以我们不会进入一个繁忙的循环。
* 由于这很可能是用户或其客户端库的处理程序中的错误,我们也将记录它。
*/
if (logger.isDebugEnabled()) {
logger.debug("Selector.select() returned prematurely because " +
"Thread.currentThread().interrupt() was called. Use " +
"NioEventLoop.shutdownGracefully() to shutdown the NioEventLoop.");
}
return true;
}
if (SELECTOR_AUTO_REBUILD_THRESHOLD > 0 &&
selectCnt >= SELECTOR_AUTO_REBUILD_THRESHOLD) {
// The selector returned prematurely many times in a row.
// Rebuild the selector to work around the problem.
/**
* 重点在这里
* 选择器在一行中过早地返回多次。
* 重新生成选择器以解决该问题。
*/
logger.warn("Selector.select() returned prematurely {} times in a row; rebuilding Selector {}.",
selectCnt, selector);
rebuildSelector();
return true;
}
return false;
}