为了提升性能,如果用户实现的ChannelHandler包含复杂或者可能导致同步阻塞的业务逻辑,往往需要通过线程池来提升并发能力,线程池添加有两种策略:用户自定义线程池执行业务ChannelHandler,以及通过Netty的EventExecutorGroup机制来并行执行ChannelHandler。
案例重现
服务端采用Netty内置的DefaultEventExecutorGroup来并行调用业务的Handler,相关代码:
public class ConcurrentPerformanceServer {
static final EventExecutorGroup executor = new DefaultEventExecutorGroup(100);
public static void main(String[] args) throws InterruptedException {
EventLoopGroup bossGroup = new NioEventLoopGroup(1);
EventLoopGroup workerGroup = new NioEventLoopGroup();
try{
ServerBootstrap b = new ServerBootstrap();
b.group(bossGroup, workerGroup)
.channel(NioServerSocketChannel.class)
.childHandler(new ChannelInitializer<SocketChannel>() {
@Override
protected void initChannel(SocketChannel socketChannel) throws Exception {
ChannelPipeline p = socketChannel.pipeline();
p.addLast(executor, new ConcurrentPerformanceServerHandler());
}
});
ChannelFuture f = b.bind(8888).sync();
f.channel().closeFuture().sync();
} finally {
bossGroup.shutdownGracefully();
workerGroup.shutdownGracefully();
}
}
}
在服务端初始化时创建了一个线程数为100的EventExecutorGroup,并将其绑定到业务的Handler,这样就可以实现I/O线程和业务逻辑处理线程的隔离,同时还能并发执行Handler,提升性能。
在业务的Handler中,通过随机休眠模拟复杂业务操作耗时,同时利用定时任务线程池周期性统计服务器的处理性能。相关代码:
public class ConcurrentPerformanceServerHandler extends ChannelInboundHandlerAdapter {
AtomicInteger counter = new AtomicInteger(0);
static ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
@Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
scheduledExecutorService.scheduleAtFixedRate(() ->{
int qps = counter.getAndSet(0);
System.out.println("The Server QPS is : " + qps);
},0, 1000, TimeUnit.MILLISECONDS);
}
@Override
public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
((ByteBuf)msg).release();
counter.incrementAndGet();
Random random = new Random();
TimeUnit.MILLISECONDS.sleep(random.nextInt(1000));
}
}
在客户端和服务端之间建立一个TCP长连接,以100QPS的速度压测服务器,代码如下:
public class ConcurrentPerformanceClientHandler extends ChannelInboundHandlerAdapter {
static ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
@Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
scheduledExecutorService.scheduleAtFixedRate(() ->{
for(int i = 0; i < 100; i++){
ByteBuf firstMessage = Unpooled.buffer(100);
for(int k = 0; k < firstMessage.capacity(); k++){
firstMessage.writeByte((byte) i);
}
ctx.writeAndFlush(firstMessage);
}
},0, 1000, TimeUnit.MILLISECONDS);
}
}
测试结果:
吞吐量为个位数,这里业务耗时在100ms~到1000ms,因此怀疑业务Handler并没有并发执行,而是被单线程执行。查看服务端的线程堆栈:
发现业务设置的100个线程只有一个运行。由于单个线程执行包含负责业务逻辑操作的Handler,所以性能不高。
无法并行执行分析
分析源码,查看绑定DefaultEventExecutorGroup到业务ChannelHandler的代码,如下(DefaultChannelPipeline类):
public final ChannelPipeline addLast(EventExecutorGroup group, String name, ChannelHandler handler) {
AbstractChannelHandlerContext newCtx;
synchronized(this) {
checkMultiplicity(handler);
newCtx = this.newContext(group, this.filterName(name, handler), handler);
this.addLast0(newCtx);
//省略后续代码
}
其中newContext具体代码为创建一个DefaultChannelHandlerContext类返回,创建过程会调用childExecutor(group)方法,从EventExecutorGroup中选择一个EventExecutor绑定到DefaultChannelHandlerContext,相关代码如下:
private EventExecutor childExecutor(EventExecutorGroup group) {
Map<EventExecutorGroup, EventExecutor> childExecutors = this.childExecutors;
if (childExecutors == null) {
childExecutors = this.childExecutors = new IdentityHashMap(4);
}
EventExecutor childExecutor = (EventExecutor)childExecutors.get(group);
if (childExecutor == null) {
childExecutor = group.next();
childExecutors.put(group, childExecutor);
}
return childExecutor;
}
通过group.next()方法,从EventExecutorGroup中选择一个EventExecutor,存放到EventExecutorMap中。对于某个具体的TCP连接,绑定到业务ChannelHandler实例上的线程池为DefaultEventExecutor,因此调用的就是DefaultEventExecutor的execute方法,由于DefaultEventExecutor继承自SingleThreadEventExecutor,所以执行execute方法就是把Runnable放入任务队列由单线程执行。
所以无论消费端有多少个线程来并发压测某条链路,对于服务端都只有一个DefaultEventExecutor线程来执行业务ChannelHandler,无法实现并行调用。
优化策略
- 如果所有客户端的并发连接数小于业务需要配置的线程数,建议将请求消息封装成任务,投递到后端业务线程池执行,ChannelHandler不需要处理复杂业务逻辑,也不需要绑定EventExecutorGroup。
public class ConcurrentPerformanceServerHandler extends ChannelInboundHandlerAdapter {
AtomicInteger counter = new AtomicInteger(0);
static ScheduledExecutorService scheduledExecutorService = Executors.newSingleThreadScheduledExecutor();
static ExecutorService executorService = Executors.newFixedThreadPool(100);
@Override
public void channelActive(ChannelHandlerContext ctx) throws Exception {
scheduledExecutorService.scheduleAtFixedRate(() ->{
int qps = counter.getAndSet(0);
System.out.println("The Server QPS is : " + qps);
},0, 1000, TimeUnit.MILLISECONDS);
}
@Override
public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
((ByteBuf)msg).release();
executorService.execute(() ->{
counter.incrementAndGet();
Random random = new Random();
try {
TimeUnit.MILLISECONDS.sleep(random.nextInt(1000));
} catch (InterruptedException e) {
e.printStackTrace();
}
});
}
}
结果展示:
The Server QPS is : 59
The Server QPS is : 55
The Server QPS is : 61
The Server QPS is : 68
The Server QPS is : 43
The Server QPS is : 78
QPS明显上升。
- 如果所有客户端并发连接数大于或等于业务需要配置的线程数,则可以为业务ChannelHandler绑定EventExecutorGroup,并在业务ChannelHandler中执行各种业务逻辑。客戶端创建10个TCP连接,每个连接每秒发送1条请求信息,同时将之前DefaultEventExecutorGroup的大小设置为10,则整体QPS也是10,线程堆栈情况: