Couple years ago, I join to a project to implement a
extremely fast RPC framework on the top of the most popular network
libraries on jvm. It achieved up to 173 Kqps by grizzly, netty3's
is about 170 Kqps. After some investigation, we found the reason
that grizzly has its own memory allocation algorithm at that time.
One year ago, I tried netty4 for expecting a even better
performance since it's already add jemalloc like memory allocation.
But unfortunately, the result was much worse than netty3.I believed
the most important problem there is netty4 strictly limit user how
to use thread.
Because it's a generic rpc framework, we couldn't
know the concrete rpc method is blocking or not, so we assume that
blocking is normality. We used 3 thread pool in netty3, boss pool,
io pool to do marshalling/unmarshallings, and business pool for
application-layer processing. It gained a good performance.
However, in netty4, EventLoopGroup is inherit from ExecutorService.
Which means thread scheduling is handled by netty's builtin
EventLoopGroup. The NioEventLoopGroup service runnable tasks in a
round-robin fashion, if one of the tasks take a long time, it will
block all queued on that ordinal. We can't benefit from some
advanced thread scheduling pools like ForkJoinPool. I think this is
major reason why netty4 is slower in my case.
正文(必填)